Segment audio (segment sound): a practical definition

Segment audio means selecting (or prompting for) the parts of an audio clip that correspond to a target sound event—then separating that sound into its own track. In modern tools, this can be driven by text prompts, visual cues, or time-span prompts.

If you’re here for SAM Audio: the fastest learning path is to try the official demo first, then come back for prompt tips.

How to segment sound (step by step)

  • Pick a short clip (10–30s) with a clear target sound.
  • Start with a text prompt that names the sound, not the source (“footsteps”, “applause”, “engine”, “snare drum”).
  • If the tool supports it, narrow down with a span prompt: mark the time range where the sound is most obvious.
  • Preview the extracted output and listen for leakage.
  • Refine the prompt: be more specific (“quiet footsteps on pavement” vs “footsteps”), or shorten the span.
  • Export the separated track for editing.

Prompt tips that usually improve separation

  • Prefer concrete sound names: “laughter”, “typing”, “glass clink”.
  • Avoid mixing goals in one prompt. Segment one sound event at a time.
  • If the target overlaps heavily (music + speech), tighten the span to the clearest segment first.
  • Iterate quickly—promptable separation is an “edit loop”, not a one-shot.

Next steps

If you’re editing right now, these are the fastest follow-ups: prompt packs · fix bad results · isolate sounds guide.

Source links: Meta AI blog · About Meta news · Back to home