Hevonen seisoo karsinassa, jossa on valvontakamera ja mikrofoni. Nuolet osoittavat, että tiedot lähetetään kaiuttimeen ja näytölle, jossa näkyy videokuvaa ja kaavio, mikä viittaa etävalvontatekniikkaan.

Savonia Article Pro: Horse Stall Data Acquisition System: A Scalable and Reliable Audio-Visual Recording and Transfer Framework

Savonia Article Pro is a collection of multidisciplinary Savonia expertise on various topics.

This work is licensed under CC BY-SA 4.0Creative Commons logoCreative Commons Attribution logoCreative Commons Share Alike logo

1. Introduction

Our goal in the “Tekoälyä Talleille” project is to develop an AI system that recognizes the abnormal sounds made by horses in stables. To this end, we are initially building a system that collects the necessary sound data for the seven-horse section at Rauhalahti Riding School. In a previous article, we detailed the equipment selection and rationale. This article focuses on the setup and technical architecture of the data collection framework.

The system is built using low-cost Raspberry Pi devices, reliable Linux tooling, and a structured scheduled recording and data transfer mechanism. It supports continuous operation, is field-deployable, and ensures data readiness for AI tasks such as acoustic behavior modeling and video analysis.

2. Objectives

The primary goal of the Horse Barn Data Collection System is to automate the continuous collection of synchronized audio and video data from each horse throughout the night. For each horse, the system is configured to record 30-minute sessions from 7:00 PM to 9:00 AM. During this time, both audio and video streams are recorded in parallel using connected USB microphones (Rode Videomic Go II) and cameras (Arducam 2MP IMX462 Day and IR Night Vision USB Camera).

The system is specifically designed to support the storage of high-quality audio (in mono .flac format) and optimized video streams suitable for post-processing, monitoring, and AI model training. To address network constraints (especially in rural environments where mobile data may be the only connection), the framework includes an efficient video compression mechanism. This significantly reduces file sizes before transferring data to Savonia’s central NVIDIA DGX server.

The system is scalable and supports plug-and-play deployment with minimal configuration. It prioritizes reliability, maintainability, and synchronization across multiple Raspberry Pi units.

Kaaviossa näkyy neljä huonetta, joissa jokaisessa on hevonen, mikrofoni ja kamera. Minitietokoneet yhdistetään kameroihin USB-kaapeleilla ja reitittimeen ethernet-kaapeleilla. Myös seinäpistorasiat ja sähkökaapelit on esitetty.
Figure 1: Horse Barn Data Collection System Infrastructure

The system capabilities include:

  • Continuous 30-minute interval recordings daily between 7:00 PM and 9:00 AM.
  • Synchronized audio and video support from 1-2 USB microphones and 1-2 USB cameras.
  • Automatically compresses large video files to meet network constraints.
  • Transfers recorded files daily to a Savonia NVIDIA DGX server for storage and analysis.
  • Scalable deployment across multiple Raspberry Pi units.

3. System Architecture

3.1. Hardware

Each data collecting unit is built using a Raspberry Pi 5 Model B, chosen for its balance of affordability and performance. Depending on the stall’s layout, the Raspberry Pi can support up to two microphones and two cameras. Audio quality is prioritized for AI research, while video serves only to support the labeling process by recording the source of a sound visually. A sufficient capacity microSD card (128 GB) stores local recordings temporarily until sessional transfer is completed.

Connectivity is achieved via Wi-Fi or Ethernet. In remote barns where wired internet is unavailable, mobile WAN solutions (4G USB modems, “ZTE MU5001 4G/5G” or “Advantech ICR-4400” router) are used to establish a network uplink for remote management and file transfer to Savonia “NVIDIA DGX” server.

Sotkuinen työtaso, jossa on useita pistorasioita, sekaisin olevia kaapeleita, useita elektronisia laitteita, USB-keskittimiä, reitittimiä antenneineen ja pistorasioihin kytkettyjä verkkosovittimia. Alue näyttää olevan täynnä teknisiä laitteita.
Figure 2: Testing the system before installation

3.2. Software Stack

The system runs Raspberry Pi OS Lite. Audio and video are captured using ffmpeg. Task automation is handled by systemd timers. rsync over SSH handles secure data transfer. flock prevents process overlap. Logging and configuration management are built around Bash and GitLab.

Virtauskaavio, jossa FFmpeg, systemd, rsync ja flock syötetään Bashiin, joka muodostaa yhteyden GitLabiin. Kunkin työkalun toiminto on tiivistetty sen kuvakkeen alapuolelle, mikä havainnollistaa automatisoinnin ja versionhallinnan työnkulkua.
Figure 3: Key software components

3.3. Directory Structure

To ensure clean data organization, each Raspberry Pi follows a unified directory structure:

Hakemistopuu /home/barn/recordings/, jossa näkyy neljä kansiota: audio, video, lokit ja skriptit, joista jokaisen sisällön lyhyt kuvaus on oikealla.
Figure 4: Directory organization in Raspberry Pi

This structure aids in debugging, transfer automation, and system scaling.

4. Recording Workflow

The recording process is fully automated using systemd services and timers. Each Raspberry Pi begins recording audio and video every 30 minutes between 19:00 and 09:00, creating 30-minute-long synchronized media files. As each audio and video recording is completed, a .done file is created with the same name as the recording. This .done file is used to manage the initiation of transfer of the relevant recording file during the transfer session.

Vuokaaviossa näkyy systemd-ajastin, joka käynnistää kaksi palvelua: Videotallennuspalvelu luo .mp4-videon ja merkitsee sen valmiiksi. Kaikki teksti on vaaleanpunaista mustalla pohjalla.
Figure 5: Recording workflow

4.1. Audio

Audio is recorded in mono using ffmpeg, sourced from microphones. The recordings are saved in flac format to preserve quality while reducing file size. The typical file size for each 30-minute audio session is approximately 20 MiB.

Example command (simplified):

ffmpeg -f pulse -i “$MIC_NAME” -ac 1 -ar 16000 -t $DURATION -c:a flac “$AUDIO_FILE”

4.2. Video

Video is captured in grayscale using USB cameras at 640×480 resolution and 15 fps. The raw recording is encoded with H.264 (libx264) to balance quality and speed. Each raw 30-minute video is about 370 MiB in size.

Example command (simplified):

ffmpeg -f v4l2 -framerate 15 -video_size 640×480 -i “$CAM_DEV” -vf format=gray -t $DURATION -c:v libx264 -preset ultrafast “$VIDEO_FILE”

5. Compression Strategy

To reduce storage usage and ensure faster transfer over limited-bandwidth connections, all video files are compressed before transfer using a slower ffmpeg preset and a lower CRF (Constant Rate Factor) value. This compression can reduce file sizes from 370 MiB to 35 MiB per file, with minimal visible quality loss.

Example compression logic:

ffmpeg -y -i “$DATA_FILE” -c:v libx264 -preset slow -crf 28 “$COMPRESSED_FILE”

Original files are deleted upon successful compression, and the compressed file is renamed to match the original naming scheme. File size is typically reduced from 370 MiB to around 35 MiB.

Virtauskaavio videon pakkauksesta: Raaka MP4-video (~370 MiB) pakataan ffmpegillä ja CRF 28:lla, jolloin tuloksena on pienempi MP4 (~35 MiB). Alkuperäinen tiedosto poistetaan ja pakattu tiedosto nimetään uudelleen.
Figure 6: Compression workflow

6. Transfer Pipeline

All Raspberry Pi units transfer the previous 30 minutes session files a dedicated systemd timer and rsync over SSH. This ensures efficient use of bandwidth and supports resumption if interrupted.

On each Raspberry Pi, the transfer session begins after 30 minutes of recording. To use bandwidth efficiently, each of the four Raspberry Pi devices have been assigned a separate sending time interval. Each device starts sending data at a 5-minute interval.

When the transfer session begins, the system first checks the .done file. If any .done file already exists, system starts sending an audio or video file with the same name. Audio files are sent without any processing, while video files are first compressed and then sent. After the transfer is complete, the relevant .done file and audio/video file are removed.

The system takes log records at each step of the sending session.

Key features of transfer pipeline:

  • Transfers start after 30 minutes session recordings end.
  • Audio and video files are sent to structured remote folders in the Savonia DGX server (e.g., /data/audio/ and /data/video/).
  • Logs are kept for all transfers, including failures.
  • SSH key authentication is supported for security and convenience.
Vuokaavio, jossa esitetään automatisoitu prosessi: Tarkistaa .done-tiedostot, määrittää tiedostotyypin (ääni tai video), pakkaa videot, lähettää tiedostot rsyncin kautta, poistaa alkuperäiset tiedostot, jos se onnistuu, ja kirjaa tuloksen.
Figure 7: Transfer workflow

7. Scheduling with systemd

The entire operation is managed by systemd services and timers. These are preferred over cron for better reliability and native logging. There are separate services and timers for:

  • Starting 30-minute recordings every hour between 19:00 and 09:00.
  • Transferring recorded data after the session ends.
  • Preventing overlap using flock lock files.

Timers are defined in .timer units, and their targets point to matching .service units.

8. Logging and Monitoring

All operations—recordings, compressions, and transfers—are logged using timestamped entries. Logs are saved in:

/home/barn/recordings/logs/

Each log entry includes:

  • Timestamps
  • Device identifiers
  • Success or failure status
  • ffmpeg errors (if any)
  • Transfer outcomes

This enables quick troubleshooting and long-term system health monitoring.

9. Scalability and Deployment

Each Raspberry Pi is identified by stall/horse ID and logs accordingly. The system is designed for plug-and-play deployment in other stalls. GitLab is used for version control and synchronized updates.

10. Data Volume Estimation

For 7 camera–microphone pairs recording 30-minute intervals from 19:00 to 09:00:

  • 24 sessions × 7 units = 168 audio files and 168 video files per night
  • Audio (20 MiB × 168) ≈ 3.36 GiB
  • Raw video (370 MiB × 168) ≈ 62.16 GiB
  • Compressed video (avg. 35 MiB × 168) ≈ 5.88 GiB

Total expected transfer file volume per night (with compressed video files): ~ 9.24 GiB

11. Conclusion

Our horse stall data acquisition system was designed to collect horse sounds for the project and the video recordings used to label that data. The system records the recordings on a predetermined schedule and sends them to the Savonia NVIDIA DGX server. Although initially installed at Rauhalahti Riding School, the system is modular and robust, making it easily applicable to other stables.

* This work is part of the “Tekoälyä talleille” project at Savonia University of Applied Sciences.


Authors:

The correct list of authors, titles, and emails for both articles is as follows:

Osman Torunoglu, RDI Specialist, DigiCenter – osman.torunoglu@savonia.fi

Johannes Geisler, RDI Specialist, DigiCenter – johannes.geisler@savonia.fi

Finlay Hare, RDI Project Worker, DigiCenter – finlay.hare@savonia.fi

Aki Happonen, Digital Development Manager, DigiCenter – aki.happonen@savonia.fi

Heli Suomala, Project Manager and Expert, Finnish Horse Information Centre – heli.suomala@hevostietokeskus.fi


Neljän järjestön logot peräkkäin: sininen hevosen pää, jossa on suomenkielinen teksti, vaaleanpunainen suorakulmio, jossa on SAVONIA, vihreä ja ruskea logo, jossa on suomenkielinen teksti, ja Euroopan unionin lippu, jossa on suomenkielinen teksti.