Data Types

Audio Processing API

Scalable audio REST API to convert, trim, concatenate, optimize, and compress audio files.

https://upcdn.io
Account
/W142hJk/
API
audio
File Path
/example.mp3
Parameters
?br=96

1 Upload your input

Firstly, your audio file must be uploaded or accessible to Bytescale:

Icon

Use the Bytescale Dashboard to upload a file manually.

Icon

Use the Upload Widget, Bytescale SDKs or Bytescale API to upload a file programmatically.

Icon

Use our external storage options to process external audio.

2 Build your audio URL

Build an audio processing URL:

2a

Get the raw URL for your file:

https://upcdn.io/W142hJk/raw/example.mp3

2b

Replace "raw" with "audio":

https://upcdn.io/W142hJk/audio/example.mp3

2c

Add querystring parameters to control the output:

https://upcdn.io/W142hJk/audio/example.mp3?br=96

3 Play your audio

Play your audio by navigating to the URL from step 2.

By default, your audio will be encoded to AAC.

The default HTTP response will be an HTML webpage with an embedded audio player. This is for debug purposes only: developers are expected to override this behavior by specifying an f option when embedding audio into their webpages and apps.

Example #1: Embedding an audio file

To embed audio in a webpage using Video.js:

<!DOCTYPE html>
<html>
<head>
<link href="https://unpkg.com/video.js@7/dist/video-js.min.css" rel="stylesheet">
<script src="https://unpkg.com/video.js@7/dist/video.min.js"></script>
<style type="text/css">
.audio-container {
height: 316px;
max-width: 600px;
}
</style>
</head>
<body>
<div class="audio-container">
<video-js
class="vjs-fill vjs-big-play-centered"
controls
preload="auto">
<p class="vjs-no-js">To play this audio please enable JavaScript.</p>
</video-js>
</div>
<script>
var vid = document.querySelector('video-js');
var player = videojs(vid, {responsive: true});
player.on('loadedmetadata', function() {
// Begin playing from the start of the audio. (Required for 'f=hls-aac-rt'.)
player.currentTime(player.seekable().start(0));
});
player.src({
src: 'https://upcdn.io/W142hJk/audio/example.mp3!f=hls-aac-rt&br=80&br=256',
type: 'application/x-mpegURL'
});
</script>
</body>
</html>

The f=hls-aac-rt output format is designed to reduce the wait time for your listeners when the given audio has not been transcoded before. Like the other output formats, this audio format incurs an initial delay while transcoding starts. However, unlike the other formats, once transcoding begins the audio will be streamed to listeners during transcoding. As with the other formats, once transcoded, the resulting audio will be cached and will not need to be transcoded again.

Example #3: Creating MP3 audio

To create an MP3 file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=mp3 to the URL.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the MP3 file:

https://upcdn.io/W142hJk/audio/example.mp3?f=mp3
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.mp3",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=mp3&a=/audio.mp3"
}
}
}

Example #4: Creating AAC audio

To create an AAC file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=aac to the URL.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the AAC file:

https://upcdn.io/W142hJk/audio/example.mp3?f=aac
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.aac",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=aac&a=/audio.aac"
}
}
}

Example #2: Creating WAV audio

To create a WAV file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=wav-riff to the URL.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the WAV file:

https://upcdn.io/W142hJk/audio/example.mp3?f=wav-riff
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.wav",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=wav-riff&a=/audio.wav"
}
}
}

Example #5: Creating HLS audio with multiple bitrates

To create an HTTP Live Streaming (HLS) file:

1

Upload an input file (e.g. an audio or video file) or create an external file source.

2

Replace /raw/ with /audio/ in the file's URL, and then append ?f=hls-aac to the URL.

2a

Add parameters from the Audio Transcoding API or Audio Compression API

2b

You can create adaptive bitrate (ABR) audio by specifying multiple groups of bitrate and/or sample rate parameters. The end-user's audio player will automatically switch to the most appropriate variant during playback. By default, a single 96 kbps variant is produced.

2c

You can specify up to 10 variants. Each variant's parameters must be adjacent on the querystring. For example: br=80&sr=24&br=256&sr=48 specifies 2 variants, whereas br=80&br=256&sr=24&sr=48 specifies 3 variants (which would most likely be a mistake). You can add next=true between groups of parameters to forcefully split them into separate variants.

3

Navigate to the URL (i.e. request the URL using a simple GET request).

4

Wait for status: "Succeeded" in the JSON response.

5

The result will contain a URL to the HTTP Live Streaming (HLS) file:

https://upcdn.io/W142hJk/audio/example.mp3?f=hls-aac&br=80&br=256
{
"jobUrl": "https://api.bytescale.com/v2/accounts/W142hJk/jobs/ProcessFileJob/01H3211XMV1VH829RV697VE3WM",
"jobDocs": "https://www.bytescale.com/docs/job-api/GetJob",
"jobId": "01H3211XMV1VH829RV697VE3WM",
"jobType": "ProcessFileJob",
"accountId": "W142hJk",
"created": 1686916626075,
"lastUpdated": 1686916669389,
"status": "Succeeded",
"summary": {
"result": {
"type": "Artifact",
"artifact": "/audio.m3u8",
"artifactUrl": "https://upcdn.io/W142hJk/audio/example.mp3!f=hls-aac&br=80&br=256&a=/audio.m3u8"
}
}
}

Example #6: Creating HLS audio with real-time transcoding

Real-time transcoding allows you to return HLS manifests (.m3u8 files) while they're being transcoded, rather than having to wait for the full transcode job to complete.

To create HTTP Live Streaming (HLS) audio with real-time transcoding:

1

Complete the steps from creating HLS audio.

2

Replace f=hls-aac with f=hls-aac-rt.

3

The result will be an M3U8 file that's dynamically updated as new segments finish transcoding:

https://upcdn.io/W142hJk/audio/example.mp4?f=hls-aac-rt
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=2038521,AVERAGE-BANDWIDTH=2038521,CODECS="mp4a.40.2"
example.mp3!f=hls-aac-rt&a=/0f/manifest.m3u8

Example #7: Extracting audio metadata

The Audio Metadata API allows you to extract the audio file's duration, codec, and more.

To extract an audio file's duration using JavaScript:

<!DOCTYPE html>
<html>
<body>
<p>Please wait, loading audio metadata...</p>
<script>
async function getAudioDuration() {
const response = await fetch("https://upcdn.io/W142hJk/audio/example.mp4?f=meta");
const jsonData = await response.json();
const audioTrack = (jsonData.tracks ?? []).find(x => x.type === "Audio");
if (audioTrack === undefined) {
alert("Cannot find audio metadata.")
}
else {
alert(`Duration (seconds): ${audioTrack.duration}`)
}
}
getAudioDuration().then(() => {}, e => alert(`Error: ${e}`))
</script>
</body>
</html>

Supported Inputs

The Audio Processing API can transcode audio from video and audio files:

Supported Input Audio

The Audio Processing API can transcode audio from the following audio inputs:

File Extension(s)Audio ContainerAudio Codecs

.wma, .asf

Advanced Systems Format (ASF)

WMA, WMA2, WMA Pro

.fla, .flac

FLAC

FLAC

.mp3

MPEG-1 Layer 3

MP3

.ts, .m2ts

MPEG-2 TS

MP2, PCM

.aac, .mp4, .m4a

MPEG-4

AAC

.mka

Matroska Audio Container

Opus, FLAC

.oga

OGA

Opus, Vorbis, FLAC

.wav

Waveform Audio File

PCM

Supported Input Videos

The Audio Processing API can transcode audio from the following video inputs:

File Extension(s)Video ContainerVideo Codecs

.gif

No Container

GIF 87a, GIF 89a

.m2v, .mpeg, .mpg

No Container

AVC (H.264), DV/DVCPRO, HEVC (H.265), MPEG-1, MPEG-2

.3g2

3G2

AVC (H.264), H.263, MPEG-4 part 2

.3gp

3GP

AVC (H.264), H.263, MPEG-4 part 2

.wmv

Advanced Systems Format (ASF)

VC-1

.flv

Adobe Flash

AVC (H.264), Flash 9 File, H.263

.avi

Audio Video Interleave (AVI)

Uncompressed, Canopus HQ, DivX/Xvid, DV/DVCPRO, MJPEG

.mxf

Interoperable Master Format (IMF)

Apple ProRes, JPEG 2000 (J2K)

.mxf

Material Exchange Format (MXF)

Uncompressed, AVC (H.264), AVC Intra 50/100, Apple ProRes (4444, 4444 XQ, 422, 422 HQ, LT, Proxy), DV/DVCPRO, DV25, DV50, DVCPro HD, JPEG 2000 (J2K), MPEG-2, Panasonic P2, SonyXDCam, SonyXDCam MPEG-4 Proxy, VC-3

.mkv

Matroska

AVC (H.264), MPEG-2, MPEG-4 part 2, PCM, VC-1

.mpg, .mpeg, .m2p, .ps

MPEG Program Streams (MPEG-PS)

MPEG-2

.m2t, .ts, .tsv

MPEG Transport Streams (MPEG-TS)

AVC (H.264), HEVC (H.265), MPEG-2, VC-1

.dat, .m1v, .mpeg, .mpg, .mpv

MPEG-1 System Streams

MPEG-1, MPEG-2

.mp4, .mpeg4

MPEG-4

Uncompressed, DivX/Xvid, H.261, H.262, H.263, AVC (H.264), AVC Intra 50/100, HEVC (H.265), JPEG 2000, MPEG-2, MPEG-4 part 2, VC-1

.mov, .qt

QuickTime

Uncompressed, Apple ProRes (4444, 4444 XQ, 422, 422 HQ, LT, Proxy), DV/DVCPRO, DivX/Xvid, H.261, H.262, H.263, AVC (H.264), AVC Intra 50/100, HEVC (H.265), JPEG 2000 (J2K), MJPEG, MPEG-2, MPEG-4 part 2, QuickTime Animation (RLE)

.webm

WebM

VP8, VP9

Some codec profiles are not supported by Bytescale. It is worth noting that AVC (H.264) High 4:4:4 Predictive is currently not supported. We aim to provide a full list of supported profiles in the near future.

Audio Metadata API

Use the Audio Metadata API to extract the duration, codec, and other information from an audio file.

Instructions:

  1. Replace raw with audio in your audio URL.

  2. Append ?f=meta to the URL.

  3. The result will be a JSON payload describing the audio's tracks (see below).

Example audio metadata JSON response:

{
"tracks": [
{
"bitRate": 159980,
"bitRateMode": "VBR",
"channels": 2,
"codec": "AAC",
"codecId": "mp4a-40-2",
"frameCount": 35875,
"frameRate": 46.875,
"samplingRate": 48000,
"title": "Stereo",
"type": "Audio"
}
]
}

Audio Transcoding API

Use the Audio Transcoding API to transcode your audio to a specific format.

Use the f parameter to change the output format of the audio:

FormatTranscodingCompressionBrowser Support

f=mp3

medium

good

all

f=aac recommended

medium

excellent

all

f=wav-riff

medium

none

none

f=wav-rf64

medium

none

none

f=hls-aac

medium

excellent

requires SDK

f=hls-aac-rt

fast

excellent

requires SDK

f=mp3

Transcodes the audio to MP3 (.mp3).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the MP3 file on job completion.

f=aac

Transcodes the audio to AAC (.aac).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the AAC file on job completion.

f=wav-riff

Transcodes the audio to Waveform (.wav) using the RIFF wave format.

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the WAV file on job completion.

f=wav-rf64

Transcodes the audio to Waveform (.wav) using the RF64 wave format (to support output audio larger than 4GB).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the WAV file on job completion.

f=hls-aac

Transcodes the audio to HLS AAC (.m3u8).

Response: JSON for an asynchronous transcode job. The JSON will contain the URL to the M3U8 file on job completion.

Browser support: all browsers (requires an audio player SDK with HLS support, like Video.js)

f=hls-aac-rt

Transcodes the audio to HLS AAC (.m3u8) and returns the audio while it's being transcoded.

This output format is designed to reduce the wait time for your listeners when the given audio has not been transcoded before. Like the other output formats, this audio format incurs an initial delay while transcoding starts. However, unlike the other formats, once transcoding begins the audio will be streamed to listeners during transcoding. As with the other formats, once transcoded, the resulting audio will be cached and will not need to be transcoded again.

Caveat: This format introduces challenges for some audio players and audio SDKs due to the use of a live M3U8 playlist during transcoding. As such, we generally recommend using one of the asynchronous formats (which don't end with -rt) for a simpler implementation.

Response: M3U8

Browser support: all browsers (requires an audio player SDK with HLS support, like Video.js)

f=html-aac

Returns a webpage with an embedded audio player that's configured to play the requested audio in AAC.

Useful for sharing links to audio files and for previewing/debugging audio transformation parameters.

Response: HTML

This is the default value.

f=meta

Returns metadata for the audio file (duration, codec, etc.).

See the Audio Metadata API docs for more information.

Response: JSON (audio metadata)

rt=auto

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=auto) will be returned to the user while it's being transcoded only if the transcode rate is faster than the playback rate.

Only supported by f=hls-aac-rt and f=html-aac.

This is the default value.

rt=false

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=false) will never be returned to the user while it's being transcoded.

Use this option as a performance optimization (instead of using rt=auto) when you know the variant will always transcode at a slower rate than its playback rate:

When rt=auto is used, the initial HTTP request for the M3U master manifest will block until the first few segments of each rt=auto and rt=true variants have been transcoded, before returning the initial M3U playlist.

In general, you want to exclude slow-transcoding HLS variants to reduce this latency.

If none of the HLS variants have rt=true or rt=auto then the fastest variant to transcode will be returned during transcoding.

Only supported by f=hls-aac-rt and f=html-aac.

rt=true

If this flag is present, the audio variant expressed by the adjacent parameters on the querystring (e.g. br=80&rt=true&br=256&rt=auto) will always be returned to the user while it's being transcoded.

Only supported by f=hls-aac-rt and f=html-aac.

Audio Compression API

Use the Audio Compression API to control the file size of your audio.

br=<int>

Sets the output audio bitrate (kbps).

Supported values for f=aac, f=hls-aac, f=hls-aac-rt and f=html-aac:

16

20

24

28

32

40

48

56

64

80

96

112

128

160

192

224

256

288

320

384

448

512

576

Supported values for f=mp3:

16

24

32

40

48

56

64

72

80

88

96

104

112

120

128

136

144

152

160

168

176

184

192

200

208

216

224

232

240

248

256

264

272

280

288

296

Not applicable to f=wav (Waveform audio files do not have a bitrate).

Default: 96

sr=<number>

Sets the output audio sample rate (kHz).

Supported values for f=aac, f=hls-aac, f=hls-aac-rt and f=html-aac:

8

12

16

22.05

24

32

44.1

48

88.2

96

Supported values for f=mp3:

22.05

32

44.1

48

Supported values for f=wav:

8

16

22.05

24

32

44.1

48

88.2

96

192

Note: the sample rate will be automatically adjusted if the provided value is unsupported by the requested bitrate for the requested audio format (for example, AAC only supports sample rates between 32kHz - 48kHz when a bitrate of 96kbps is used).

Default: 48

Audio Trimming API

Use the Audio Trimming API to remove parts of the audio from the start and/or end.

ts=<number>

Sets the start position of audio, and removes all audio before that point.

If s exceeds the length of the audio, then an error will be returned.

Supports numbers between 0 - 86399 with up to two decimal places. To provide frame accuracy for audio inputs, decimals will be interpreted as frame numbers, not milliseconds.

te=<number>

Sets the end position of audio, and removes all audio after that point.

If te exceeds the length of the audio, then no error will be returned, and the parameter effectively does nothing.

Supports numbers between 0 - 86399 with up to two decimal places. To provide frame accuracy for audio inputs, decimals will be interpreted as frame numbers, not milliseconds.

tm=after-repeat

Applies the trim specified by ts and/or te after the rp parameter is applied.

tm=before-repeat

Applies the trim specified by ts and/or te before the rp parameter is applied.

This is the default value.

Audio Concatenation API

Use the Audio Concatenation API to append additional audio files to the primary audio file's timeline.

append=<string>

Appends the audio from another media file (video or audio file) to the output.

You can specify this parameter multiple times to append multiple media files.

If you specify append multiple times, then the media files will be concatenated in the order of the querystring parameters, with the primary input audio (specified on the URL's file path) playing first.

To use: specify the "file path" attribute of another media file as the query parameter's value.

rp=<int>

Number of times to play the audio file.

If this parameter appears after an append parameter, then it will repeat the appended audio file only.

If this parameter appears before any append parameters, then it will repeat the primary audio file only.

Default: 1

Audio pricing

The Audio Processing API is available on all Bytescale Plans.

Audio price list

Your processing quota (see pricing) is consumed by the output audio file's duration multiplied by a "processing multiplier": the codec of your output audio file determines the "processing multiplier" that will be used.

Audio files can be played an unlimited number of times.

Your processing quota will only be deducted once per URL: for the very first request to the URL.

There is a minimum billable duration of 10 seconds per audio file.

Audio billing example:

A 60-second audio file encoded to AAC would consume 45 seconds (60 × 0.75) from your monthly processing quota.

If the audio file is initially played in January 2024, and is then played 100k times for the following 2 years, then you would be billed 45 seconds in January 2024 and 0 seconds in all the following months. (This assumes you never clear your permanent cache).

CodecProcessing Multiplier

AAC

0.75

MP3

0.75

WAV

1.15

HLS audio pricing

When using f=hls-aac, f=hls-aac-rt or f=html-aac (which uses f=hls-aac-rt internally) your processing quota will be consumed per HLS variant.

When using f=hls-aac-rt each real-time variant (rt=true or rt=auto) will have an additional 10 seconds added to its billable duration.

The default behavior for HLS outputs is to produce one HLS AAC variant.

You can change this behavior using the querystring parameters documented on this page.

HLS pricing example:

Given an input audio file of 60 seconds and the querystring ?f=hls-aac-rt&br=64&br=128&br=256&rt=false, you would be billed:

  • 3×60 seconds for 3× HLS variants (br=64&br=128&br=256).

  • 2×10 seconds for 2× HLS variants using real-time transcoding.

    • The first two variants on the querystring (br=64&br=128) do not specify rt parameters, so will default to rt=auto.

    • Per the pricing above, real-time variants incur an additional 10 seconds of billable duration.

  • 200 seconds total billed duration: 3×60 + 2×10

Was this section helpful? Yes No

You are using an outdated browser.

This website requires a modern web browser -- the latest versions of these browsers are supported: