Using AI to create an audio driver using an audio feedback loop

I wanted a simple thing: when a package arrives at my door, play a sound effect through the nearest security camera’s speaker. What followed was a deep debugging session involving RTSP backchannels, AAC frame pacing, and spectrogram analysis. Here’s how I got it working.

The Setup

I run about 15 Dahua and Lorex IP cameras around my property, managed through Home Assistant with the Dahua custom integration (installed via HACS). Several cameras have built-in speakers, and the integration exposes them as media_player entities. The goal: trigger a “Hallelujah” sound effect on the camera that detects a package.

Problem 1: No Sound At All

The first attempt produced silence. The media_player.play_media service call completed without errors, but nothing came from the speaker. Time to investigate.

Checking the Hardware

First, verify the camera actually has a speaker:

curl -s --digest -u admin:PASSWORD 
  "http://CAMERA_IP/cgi-bin/devAudioOutput.cgi?action=getCollect"
# result=1 means speaker is present

Speaker confirmed. Next, check if audio encoding is enabled on the camera—a prerequisite for the RTSP backchannel:

curl -s --digest -u admin:PASSWORD -g 
  "http://CAMERA_IP/cgi-bin/configManager.cgi?action=getConfig&name=Encode[0].MainFormat[0]" 
  | grep AudioEnable

AudioEnable=false. That’s the problem. Without audio encoding enabled, the camera won’t advertise a backchannel audio track in its RTSP DESCRIBE response. No backchannel means no speaker output.

The Fix

curl -s --digest -u admin:PASSWORD -g 
  "http://CAMERA_IP/cgi-bin/configManager.cgi?action=setConfig
  &Encode[0].MainFormat[0].AudioEnable=true
  &Encode[0].ExtraFormat[0].AudioEnable=true"

After enabling audio, the RTSP DESCRIBE response now includes a sendonly audio track (trackID=5), which is the ONVIF backchannel the integration uses to send audio to the speaker.

I added detection for this condition to the integration—it now logs a warning at startup if audio encoding is disabled, and provides an enable_audio service on the media player entity to fix it without manual curl commands.

Problem 2: Audio Plays, But Sounds Terrible

With audio encoding enabled, sound came out of the speaker—but it was a garbled mess, compressed into a brief burst. To diagnose this properly, I needed data, not just ears.

Spectrogram-Based Debugging

I set up a recording pipeline: play audio on one camera’s speaker while recording from a nearby camera’s microphone, then generate spectrograms for visual comparison.

Source File

First, I generated a C major scale test tone—its staircase frequency pattern is easy to identify in spectrograms:

Test tone spectrogram showing C major scale staircase pattern
Source test tone: a C major scale with clear staircase frequency steps. Each note is distinct in the spectrogram.

Baseline: AirPlay Speaker

For reference, I played the Hallelujah sound effect through a high-quality AirPlay speaker (“Deck”) and recorded it on a nearby camera:

Baseline spectrogram from AirPlay speaker showing clean harmonic content
Baseline recording: Hallelujah played through an AirPlay speaker. Clear harmonic bands, good dynamic range.

Attempt 1: Through the Camera (Broken)

Here’s what the camera speaker produced with the original code:

Broken playback spectrogram showing compressed audio burst
First camera attempt: all audio compressed into a ~2 second burst at the end. The spectrogram shows broadband noise instead of harmonic content.

The entire clip was being dumped in a short burst. Clearly a pacing issue.

Attempt 2: After Reboot (Still Broken)

Post-reboot spectrogram still showing compressed audio
After camera reboot with audio enabled: still garbled. The pacing issue is in the software, not the camera.

Finding the Root Cause

The integration converts audio to AAC (8 kHz mono, 1024 samples per frame) and sends it via RTSP backchannel. The frame pacing code calculated the interval as:

frame_interval = duration / len(frames)

The problem: when audio is piped through ffmpeg (which is how the HA integration converts media files), ffmpeg doesn’t report a Duration: for piped input. So duration = 0, and frame_interval = 0. Every frame was sent instantly.

The Fix: Fixed Frame Interval

AAC at 8 kHz uses 1024 samples per frame. That’s a fixed interval:

frame_interval = 1024.0 / 8000.0  # 0.128 seconds per frame

No need to parse duration at all. Each AAC frame represents exactly 128ms of audio.

RTSP Backchannel Test (Fixed Pacing)

Testing with the test tone through the RTSP backchannel directly, with correct 128ms pacing:

Fixed backchannel test showing clear staircase frequency pattern
RTSP backchannel with fixed 128ms pacing: the C major staircase is clearly visible. Clean, correctly-timed playback.

The staircase pattern is clearly visible—each note is distinct and properly timed.

Side-by-Side Comparisons

Here’s the before and after with the actual Hallelujah sound effect:

Side-by-side comparison of baseline AirPlay vs broken camera playback
Left: Baseline (AirPlay speaker). Right: Camera with broken pacing (v1). The camera version is compressed into a brief burst with no harmonic structure.
Three-way comparison showing baseline, broken, and fixed playback
Three-way comparison. Left: Baseline (AirPlay). Center: v1 with no pacing (all frames instant). Right: v2 with fixed 128ms pacing. The v2 spectrogram closely matches the baseline’s harmonic structure.

The v2 fix (right panel) closely matches the baseline (left panel). The harmonic content is clearly visible and properly spread across the full duration of the clip.

The Integration Changes

I contributed these fixes back to the Dahua integration:

  1. Fixed RTSP backchannel frame pacing: Use the mathematically correct 128ms interval (1024 samples / 8000 Hz) instead of trying to derive it from ffmpeg’s duration output.
  2. Audio encoding detection: At startup, the integration checks if AudioEnable is set on the camera’s encode config and logs a warning if not.
  3. enable_audio service: A new Home Assistant service on media player entities that enables audio encoding on the camera without needing to use curl or the camera’s web UI.
  4. Lorex compatibility: Lorex cameras (Dahua OEM) don’t support the audio.cgi HTTP endpoint. The integration detects this and falls back to RTSP backchannel automatically.

The Automation

With working speaker audio, the automation is straightforward. Each camera that can detect packages triggers the sound on its own speaker, throttled to once per hour per camera:

automation:
  - alias: Package Arrived play sound
    triggers:
      - entity_id: sensor.front_entry_package_count
        above: 0
        trigger: numeric_state
        id: front_entry
      - entity_id: sensor.garage_l_package_count
        above: 0
        trigger: numeric_state
        id: garage_left
      # ... more cameras
    actions:
      - condition: template
        value_template: >-
          {{ now().timestamp() - last_played > 3600 }}
      - action: media_player.play_media
        target:
          entity_id: "{{ speaker }}"
        data:
          media_content_id: media-source://media_source/local/Hallelujah-sound-effect.mp3
          media_content_type: music

Lessons Learned

  • Spectrograms are invaluable for audio debugging. They immediately show whether the problem is pacing, encoding, distortion, or something else entirely.
  • Record from a second camera to capture what the speaker actually outputs, rather than relying on subjective listening.
  • Fixed-interval pacing is more robust than duration-based calculation for streaming protocols. The math is simple: samples_per_frame / sample_rate = interval.
  • Check audio encoding first. On Dahua/Lorex cameras, the speaker won’t work unless AudioEnable=true in the encode config. This setting persists across reboots.
  • Lorex quirks: Lorex cameras are Dahua OEM but have different firmware. They don’t support audio.cgi but do support RTSP ONVIF backchannel. Some have flaky HTTP servers after soft reboots.

The complete code changes are in the Dahua integration fork, and the manual testing scripts (spectrogram generation, recording, analysis) are in the manual_tests/ directory.

ProPresenter Media Cleanup Guide

How we cleaned up a ProPresenter media library, removing duplicates, old content, and fixing broken media paths after a username change.

The Problem

Our ProPresenter installation had several issues:

  • Duplicate files wasting disk space (4.6 GB of duplicates)
  • Old content like funeral slideshows and dated events no longer needed
  • Broken media paths after the Mac username changed from mediateam to worshipmedia
  • Media referenced paths like /Users/Shared/Renewed Vision Media/ that no longer existed

Part 1: Finding and Deleting Duplicate Files

We created a bash script to find files with identical content using MD5 hashes, preferring to keep “originals” over files with _copy in the name.

#!/bin/bash
# find_duplicates.sh - Find and delete duplicate files in ProPresenter Media folder

MEDIA_DIR="$HOME/Documents/ProPresenter/Media"
ONEDRIVE_DIR="$HOME/OneDrive - Your Church Name/ProPresenter_Sync/Media"

# Set to 1 to actually delete, 0 for dry run
DRY_RUN=1

# Create temp files
HASH_FILE=$(mktemp)
DUPLICATES_FILE=$(mktemp)
trap "rm -f $HASH_FILE $DUPLICATES_FILE" EXIT

echo "Scanning $MEDIA_DIR..."

# Calculate MD5 hashes for all files
find "$MEDIA_DIR" -type f ! -name ".*" -print0 | while IFS= read -r -d '' file; do
    hash=$(md5 -q "$file" 2>/dev/null)
    if [[ -n "$hash" ]]; then
        echo "$hash|$file"
    fi
done > "$HASH_FILE"

# Find duplicate hashes
cut -d'|' -f1 "$HASH_FILE" | sort | uniq -d > "$DUPLICATES_FILE"

# Process each duplicate set
while IFS= read -r dup_hash; do
    files=()
    while IFS='|' read -r hash filepath; do
        [[ "$hash" == "$dup_hash" ]] && files+=("$filepath")
    done < "$HASH_FILE"

    # Keep original (file without _copy), delete others
    keep=""
    for f in "${files[@]}"; do
        if [[ ! "$f" == *"_copy"* && ! "$f" == *" copy"* ]]; then
            keep="$f"
            break
        fi
    done
    [[ -z "$keep" ]] && keep="${files[0]}"

    echo "KEEP: $keep"
    for f in "${files[@]}"; do
        if [[ "$f" != "$keep" ]]; then
            if [[ $DRY_RUN -eq 0 ]]; then
                rm -f "$f"
                # Also delete from OneDrive sync
                relative_path="${f#$MEDIA_DIR/}"
                rm -f "$ONEDRIVE_DIR/$relative_path"
            fi
            echo "  DELETE: $f"
        fi
    done
done < "$DUPLICATES_FILE"

Results: Found 227 duplicate sets, deleted 305 files, freed 4.6 GB.

Part 2: Finding Old/One-Time Content

We searched for presentations that were unlikely to be needed again:

  • Memorial and funeral services (named after individuals)
  • Dated annual events (Christmas Pageant 2021, Confirmation 2022)
  • One-time events (Town Hall presentations, Scout ceremonies)
  • Duplicate hymns in Special folder that exist in Default library
# Find presentations with dates or person names
find ~/Documents/ProPresenter/Libraries -name "*.pro" -exec basename {} ; | 
  grep -iE "[0-9]{4}|memorial|funeral|recognition|pageant"

We created a review file listing candidates for deletion with comments explaining why each could be removed, then manually reviewed before deleting.

Part 3: Finding Associated Media for Old Presentations

ProPresenter stores imported slides in Media/Imported/{UUID}/ folders. We needed to find which media folders were ONLY used by presentations being deleted (not shared with active presentations).

#!/usr/bin/env python3
# find_unique_media.py - Find media only used by presentations marked for deletion

import os
import re
from pathlib import Path

PROPRESENTER_DIR = Path.home() / "Documents/ProPresenter"
UUID_PATTERN = re.compile(r'[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}', re.IGNORECASE)

def extract_uuids(filepath):
    """Extract all UUIDs referenced in a .pro file."""
    with open(filepath, 'rb') as f:
        content = f.read().decode('utf-8', errors='ignore')
    return set(UUID_PATTERN.findall(content))

# Get UUIDs from presentations to delete vs keep
delete_uuids = set()
keep_uuids = set()

for pro_file in delete_presentations:
    delete_uuids.update(extract_uuids(pro_file))

for pro_file in keep_presentations:
    keep_uuids.update(extract_uuids(pro_file))

# UUIDs only in delete set are safe to remove
unique_uuids = delete_uuids - keep_uuids

# Find corresponding Media/Imported folders
for uuid in unique_uuids:
    folder = PROPRESENTER_DIR / "Media/Imported" / uuid
    if folder.exists():
        print(f"Safe to delete: {folder}")

Results: Found 5 unique media folders (44.5 MB) containing memorial slideshow images that could be safely deleted.

Part 4: Fixing Broken Media Paths

After a username change from mediateam to worshipmedia, all media paths were broken. ProPresenter stores paths in two places:

  1. Playlist files (protobuf format)
  2. Workspace database (LevelDB format)

Fixing Playlist Files with Protobuf

ProPresenter 7 uses Protocol Buffers for playlist files. We used the reverse-engineered schema from greyshirtguy/ProPresenter7-Proto.

# Clone the proto definitions
git clone https://github.com/greyshirtguy/ProPresenter7-Proto.git ~/dev/ProPresenter7-Proto

# Install protobuf tools
pip3 install grpcio-tools

# Compile proto files to Python
cd ~/dev/ProPresenter7-Proto/proto
python3 -m grpc_tools.protoc -I. --python_out=. *.proto
#!/usr/bin/env python3
# fix_media_paths.py - Fix paths in ProPresenter playlist files

import sys
from pathlib import Path
from google.protobuf.message import Message

sys.path.insert(0, str(Path.home() / "dev/ProPresenter7-Proto/proto"))
from proto import propresenter_pb2

PATH_MAPPINGS = [
    ("/Users/Shared/Renewed Vision Media/",
     "/Users/worshipmedia/Documents/ProPresenter/Media/Renewed Vision Media/"),
    ("/Users/mediateam/", "/Users/worshipmedia/"),
    ("/Users/tom/", "/Users/worshipmedia/"),
]

def fix_string(s):
    for old, new in PATH_MAPPINGS:
        s = s.replace(old, new)
    return s

def fix_message(msg, path="root"):
    """Recursively fix all string fields containing paths."""
    for field in msg.DESCRIPTOR.fields:
        if field.label == 3:  # Repeated
            for i, item in enumerate(getattr(msg, field.name)):
                if field.message_type:
                    fix_message(item, f"{path}.{field.name}[{i}]")
                elif field.type == 9 and '/' in item:  # String with path
                    getattr(msg, field.name)[i] = fix_string(item)
        elif field.message_type:
            sub_msg = getattr(msg, field.name)
            if sub_msg.ByteSize() > 0:
                fix_message(sub_msg, f"{path}.{field.name}")
        elif field.type == 9:  # String
            value = getattr(msg, field.name)
            if value and '/' in value:
                setattr(msg, field.name, fix_string(value))

# Parse and fix the Media playlist
media_file = Path.home() / "Documents/ProPresenter/Playlists/Media"
doc = propresenter_pb2.PlaylistDocument()
doc.ParseFromString(media_file.read_bytes())
fix_message(doc)
media_file.write_bytes(doc.SerializeToString())

Results: Fixed 3,472 path references in the Media playlist.

Fixing the Workspace Database

ProPresenter caches media information in a LevelDB database at:

~/Library/Application Support/RenewedVision/ProPresenter/Workspaces/ProPresenter-{ID}/Database/

The simplest fix was to let ProPresenter rebuild this database:

  1. Quit ProPresenter completely
  2. Stop the helper processes:
    pkill -9 -f "ProPresenter"
    launchctl bootout gui/$(id -u)/com.renewedvision.propresenter.workspaces-helper
  3. Delete or rename the Database folder
  4. Restart ProPresenter – it rebuilds the database and rescans media

Temporary Symlinks for Legacy Paths

For presentation files (.pro) that still reference old paths, we created symlinks:

# For /Users/Shared paths
mkdir -p /Users/Shared/Documents
ln -sf ~/Documents/ProPresenter /Users/Shared/Documents/ProPresenter
ln -sf ~/Documents/ProPresenter/Media/Renewed Vision Media /Users/Shared/Renewed Vision Media

# For old username paths (requires sudo)
sudo mkdir -p /Users/mediateam/Documents
sudo ln -sf /Users/worshipmedia/Documents/ProPresenter /Users/mediateam/Documents/ProPresenter

Summary

Task Files Affected Space Freed
Duplicate removal 305 files 4.6 GB
Old presentations 24 files 785 KB
Orphaned media folders 5 folders (187 files) 44.5 MB
Path fixes 3,472 references

Total space recovered: ~4.7 GB

Tools Used

  • md5 – macOS built-in hash tool for duplicate detection
  • protobuf/grpcio-tools – For parsing ProPresenter playlist files
  • ProPresenter7-Proto – Reverse-engineered protobuf schema
  • Python 3 – Scripting for media analysis and path fixing

Tips

  1. Always run duplicate finder in dry-run mode first
  2. Back up the Playlists/Media file before modifying
  3. The ProPresenter workspace database rebuilds automatically – sometimes deleting it is the easiest fix
  4. When deleting media, also delete from your sync folder (OneDrive, Dropbox, etc.)
  5. Check both Media/Assets/ and Media/Renewed Vision Media/ for files – they may be in unexpected locations