VoiceVoice connections operate in a similar fashion to the Gateway connection. However, they use a different set of payloads and a separate UDP-based connection for voice data transmission. Because UDP is used for both receiving and transmitting voice data, your client must be able to receive UDP packets, even through a firewall or NAT (see UDP Hole Punching for more information). The Discord Voice servers implement functionality (see IP Discovery) for discovering the local machines remote UDP IP/Port, which can assist in some network configurations.
Voice Gateway VersioningTo ensure that you have the most up-to-date information, please use version 3. Otherwise, we cannot guarantee that the Opcodes documented here will reflect what you receive over the socket.
|Version||Status||WebSocket URL Append|
|1||default||?v=1 or omit|
|0||Identify||client||begin a voice websocket connection|
|1||Select Protocol||client||select the voice protocol|
|2||Ready||server||complete the websocket handshake|
|3||Heartbeat||client||keep the websocket connection alive|
|4||Session Description||server||describe the session|
|5||Speaking||client and server||indicate which users are speaking|
|6||Heartbeat ACK||server||sent immediately following a received client heartbeat|
|7||Resume||client||resume a connection|
|8||Hello||server||the continuous interval in milliseconds after which the client should send a heartbeat|
|13||Client Disconnect||server||a client has disconnected from the voice channel|
Voice Close Event Codes
|4001||Unknown opcode||You sent an invalid opcode.|
|4003||Not authenticated||You sent a payload before identifying with the Gateway.|
|4004||Authentication failed||The token you sent in your identify payload is incorrect.|
|4005||Already authenticated||You sent more than one identify payload. Stahp.|
|4006||Session no longer valid||Your session is no longer valid.|
|4009||Session timeout||Your session has timed out.|
|4011||Server not found||We can't find the server you're trying to connect to.|
|4012||Unknown Protocol||We didn't recognize the protocol you sent.|
|4014||Disconnected||Oh no! You've been disconnected! Try resuming.|
|4015||Voice server crashed||The server crashed. Our bad! Try resuming.|
|4016||Unknown Encryption Mode||We didn't recognize your encryption.|
Connecting to Voice
Retrieving Voice Server InformationThe first step in connecting to a voice server (and in turn, a guild's voice channel) is formulating a request that can be sent to the Gateway, which will return information about the voice server we will connect to. Because Discord's voice platform is widely distributed, users should never cache or save the results of this call. To inform the gateway of our intent to establish voice connectivity, we first send an Opcode 4 Gateway Voice State Update:
Gateway Voice State Update Example
If our request succeeded, the gateway will respond with two events—a Voice State Update event and a Voice Server Update event—meaning your library must properly wait for both events before continuing. The first will contain a new key,
session_id, and the second will provide voice server information we can use to establish a new voice connection:
Example Voice Server Update Payload
With this information, we can move on to establishing a voice websocket connection.
Establishing a Voice Websocket ConnectionOnce we retrieve a session_id, token, and endpoint information, we can connect and handshake with the voice server over another secure websocket. Unlike the gateway endpoint we receive in an HTTP Get Gateway request, the endpoint received from our Voice Server Update payload does not contain a URL protocol, so some libraries may require manually prepending it with "wss://" before connecting. Once connected to the voice websocket endpoint, we can send an Opcode 0 Identify payload with our server_id, user_id, session_id, and token:
Example Voice Identify Payload
The voice server should respond with an Opcode 2 Ready payload, which informs us of the
ssrc, UDP port, and supported encryption modes the voice server expects:
Example Voice Ready Payload
"modes": ["plain", "xsalsa20_poly1305"],
heartbeat_interval here is an erroneous field and should be ignored. The correct heartbeat_interval value comes from the Hello payload.
HeartbeatingIn order to maintain your websocket connection, you need to continuously send heartbeats at the interval determined in Opcode 8 Hello:
Example Hello Payload
This is sent at the start of the connection. Unlike the other payloads, Opcode 8 Hello does not have an opcode or a data field denoted by
There is currently a bug in the Hello payload heartbeat interval. Until it is fixed, please take your heartbeat interval as
heartbeat_interval * .75. This warning will be removed and a changelog published when the bug is fixed.
d. Be sure to expect this different format. After this, you should send Opcode 3 Heartbeat—which contains an integer nonce—every elapsed interval:
Example Heartbeat Payload
In return, you will be sent back an Opcode 6 Heartbeat ACK that contains the previously sent nonce:
Example Heartbeat ACK Payload
Establishing a Voice UDP ConnectionOnce we receive the properties of a UDP voice server from our Opcode 2 Ready payload, we can proceed to the final step of voice connections, which entails establishing and handshaking a UDP connection for voice data. First, we open a UDP connection to the same endpoint we originally received in the Voice Server Update payload, combined with the port we received in the Voice Ready payload. If required, we can now perform an IP Discovery using this connection. Once we've fully discovered our external IP and UDP port, we can then tell the voice websocket what it is, and start receiving/sending data. We do this using Opcode 1 Select Protocol:
The plain mode is no longer supported. All data should be sent using a supported encryption method, right now only
Example Select Protocol Payload
Finally, the voice server will respond with a Opcode 4 Session Description that includes the
secret_key, a 32 byte array used for encrypting and sending voice data:
Example Session Description Payload
We can now start encrypting and sending voice data over the previously established UDP connection.
"secret_key": [ ...251, 100, 11...]
Encrypting and Sending VoiceVoice data sent to discord should be encoded with Opus, using two channels (stereo) and a sample rate of 48kHz. Voice Data is sent using a RTP Header, followed by encrypted Opus audio data. Voice encryption uses the key passed in Opcode 4 Session Description combined with the 24 byte header (used as a nonce, appended with 12 null bytes), encrypted with libsodium:
Encrypted Voice Packet Header Structure
|Type||Single byte value of |
|Version||Single byte value of |
|Sequence||unsigned short (big endian)||2 bytes|
|Timestamp||unsigned int (big endian)||4 bytes|
|SSRC||unsigned int (big endian)||4 bytes|
SpeakingTo notify clients that you are speaking or have stopped speaking, send an Opcode 5 Speaking payload:
Example Speaking Payload
You must send at least one Opcode 5 Speaking payload before sending voice data, or you will be disconnected with an invalid SSRC error.
Voice Data InterpolationWhen there's a break in the sent data, the packet transmission shouldn't simply stop. Instead, send five frames of silence (
0xF8, 0xFF, 0xFE) before stopping to avoid unintended Opus interpolation with subsequent transmissions.
Resuming Voice ConnectionWhen your client detects that its connection has been severed, it should open a new websocket connection. Once the new connection has been opened, your client should send an Opcode 7 Resume payload:
Example Resume Connection Payload
If successful, the Voice server will respond with an Opcode 9 Resumed and an Opcode 8 Hello to signal that your client is now reconnected:
Example Resumed Payload
If the resume is unsuccessful—for example, due to an invalid session—the websocket connection will close with the appropriate close event code. You should then follow the Connecting flow to reconnect.
IP DiscoveryGenerally routers on the internet mask or obfuscate UDP ports through a process called NAT. Most users who implement voice will want to utilize IP discovery to find their external IP and port which will then be used for receiving voice communications. To retrieve your external IP and port, send a 70-byte packet with empty data past the 4-byte ssrc. The server will respond back with another 70-byte packet, this time with a NULL-terminated string of the IP, with the port encoded in a little endian unsigned short stored in the last two bytes of the packet.