- Get User Media(GUM) for OBS
The OBS Browser Source doesn't support direct access to media like Camera, Microphone, Desktop Capture, Midi and Game Controllers. The Get User Media for OBS sends media from a Chrome Browser to the OBS Browser source.
graph LR;
subgraph um[User Media]
direction LR
m[DesktopCapture,
Microphone,
Midi,
Gamepad];
end
subgraph MediaPipe
direction LR
MPa(Video and Text\nAnalysis);
end
C(GUM for OBS)
um--> C;
MediaPipe-->C;
C ---> |WebRTC|B(OBS Browser);
C ---> |WebSocket|B;
subgraph OBS
B(Browser Source);
end
Open GUM for OBS in a Chrome browser.
Optionally, download and run GUM for OBS locally.
📝 Note: The WebSpeech API doesn't work off-line. The WebSpeech API is used for Speech Recognition. Google annouced at I/O 2024 that local off-line WebSpeech is coming soon.
Use the OBS Plug-in "Local Vocal" when running GUM for OBS offline.
Open OBS.
- In the menu bar click Tools --> WebSocket Server Settings
- Check the box to "Enable WebSocket Server"
- Click the "Show Connect Info" button
- copy the password
- To capture a desktop window, check the "Desktop Window" box. how to create an OBS projector window
- To capture a microphone, check the "Microphone" box.
📝 Note: if you are only using Midi or Gamepad inputs choosing a window or microphone to capture is optional.
- Create a Scene named "Scene"
- Add a Video Capture Device source
- Open a preview window
The audio data from the selected microphone is sent to OBS as a browser event message. By default messages are sent 1 per second. The frames per second can be set to a preferred speed.
Message name: audio-input
Event listener example
window.addEventListener("audio-input", function (event) {
console.log("audio received: ",event)
document.getElementById("audioFFT").innerText = JSON.stringify(event.detail);
});
Event data
Audio is processed into 128 key value pairs.
Object
{
"dataArray": {
"0": 177,
"1": 179,
...
"127": 0
}
}
The MIDI data from detected devices is sent to OBS as a browser event message. The webMIDI.js library is used to process the MIDI devices and messages.
Message name: midi-message
Event listener example
window.addEventListener("midi-message", function (event) {
//console.log("midi received: ",event)
document.getElementById("midiMessage").innerText = JSON.stringify(
event.detail
);
});
Event data The MIDI event message includes the "deviceName" and "midiEvent" details.
Object
{
"deviceName": "iPad",
"midiEvent": "{\"rawData\":{\"0\":144,\"1\":62,\"2\":0},\"data\":[144,62,0],\"statusByte\":144,\"rawDataBytes\":{\"0\":62,\"1\":0},\"dataBytes\":[62,0],\"isChannelMessage\":true,\"isSystemMessage\":false,\"command\":9,\"channel\":1,\"type\":\"noteon\"}"
}
The Gamepad data from detected devices is sent to OBS as a browser event message.
Message name: gamepad-message
Event listener example
window.addEventListener("gamepad-message", function (event) {
console.log("gamepad-message received: ",event)
document.getElementById("gamepad-message").innerText = JSON.stringify(
event.detail
);
});
Event data The Gamepad event message details are sent as string.
Object
{
"gamepadEvent": "{\n \"index\": 0,\n \"id\": \"8BitDo Micro gamepad (Vendor: 2dc8 Product: 9020)\",\n \"mapping\": \"\",\n \"axes\": [\n -0.003921568393707275,\n -0.003921568393707275,\n -0.003921568393707275,\n 0,\n -1,\n -0.003921568393707275,\n 0,\n 0,\n 0,\n 1.2857143878936768\n ],\n \"buttons\": [\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n },\n {\n \"pressed\": false,\n \"touched\": false,\n \"value\": 0\n }\n ],\n \"vibrationActuator\": null\n}"
}
The Google MediaPipe ML models are used dectect pose, hand and face landmarks. Deteced landmarks are sent to OBS Browsers.
sequenceDiagram
GUM->>OBS: Connect to OBS
GUM->>GUM: Start Desktop capture
GUM->>+GUM: Start landmark detection
GUM->>OBS: Send landmarks to browser
0 - nose | 10 - mouth (right) | 20 - right index | 30 - right heel |
---|---|---|---|
1 - left eye (inner) | 11 - left shoulder | 21 - left thumb | 31 - left foot index |
2 - left eye | 12 - right shoulder | 22 - right thumb | 32 - right foot index |
3 - left eye (outer) | 13 - left elbow | 23 - left hip | |
4 - right eye (inner) | 14 - right elbow | 24 - right hip | |
5 - right eye | 15 - left wrist | 25 - left knee | |
6 - right eye (outer) | 16 - right wrist | 26 - right knee | |
7 - left ear | 17 - left pinky | 27 - left ankle | |
8 - right ear | 18 - right pinky | 28 - right ankle | |
9 - mouth (left) | 19 - left index | 29 - left heel |
Message name:
Event listener example
window.addEventListener("pose-landmarks", function (event) {
//Do Something
});
Event data
Object
{
"poseLandmarkerResult": [
{
"visibility": 0.999914288520813,
"x": 0.5488027334213257,
"y": 0.629255473613739,
"z": -1.2427579164505005
},
{
"visibility": 0.9997429847717285,
"x": 0.5757827758789062,
"y": 0.5809763669967651,
"z": -1.1561990976333618
},
{
"visibility": 0.9997959733009338,
"x": 0.5933735966682434,
"y": 0.5823040008544922,
"z": -1.1562100648880005
},
{
"visibility": 0.9996959567070007,
"x": 0.6136075258255005,
"y": 0.5847325921058655,
"z": -1.1562097072601318
}
...33 landmarks in the array
]
}
Message name:
Event listener example
window.addEventListener("hand-landmarks", function (event) {
//Do Something
});
Event data
Object
{
"handLandmarkEventData": "[[{\"x\":0.8247250318527222,\"y\":0.856696605682373,\"z\":1.8380380595317547e-7,\"visibility\":0},{\"x\":0.7507635354995728,\"y\":0.8093991279602051,\"z\":0.02939540520310402,\"visibility\":0},{\"x\":0.7328766584396362,\"y\":0.7667005062103271,\"z\":0.06088247895240784,\"visibility\":0},{\"x\":0.7351902723312378,\"y\":0.7290223240852356,\"z\":0.08172444999217987,\"visibility\":0},{\"x\":0.7417455911636353,\"y\":0.6961146593093872,\"z\":0.10279053449630737,\"visibility\":0},{\"x\":0.8058651685714722,\"y\":0.7270944118499756,\"z\":0.11451881378889084,\"visibility\":0},{\"x\":0.8048328161239624,\"y\":0.677108645439148,\"z\":0.14171048998832703,\"visibility\":0},{\"x\":0.8037340641021729,\"y\":0.6428552269935608,\"z\":0.14268162846565247,\"visibility\":0},{\"x\":0.8021656274795532,\"y\":0.6135662794113159,\"z\":0.1384235918521881,\"visibility\":0},{\"x\":0.8407263159751892,\"y\":0.7240689396858215,\"z\":0.09554439783096313,\"visibility\":0},{\"x\":0.8443264961242676,\"y\":0.6634253263473511,\"z\":0.1236325055360794,\"visibility\":0},{\"x\":0.8433619737625122,\"y\":0.620961606502533,\"z\":0.11451881378889084,\"visibility\":0},{\"x\":0.8402864933013916,\"y\":0.5901913642883301,\"z\":0.10323875397443771,\"visibility\":0},{\"x\":0.8709547519683838,\"y\":0.724621593952179,\"z\":0.07126611471176147,\"visibility\":0},{\"x\":0.8794510364532471,\"y\":0.6672875285148621,\"z\":0.08956820517778397,\"visibility\":0},{\"x\":0.8782241940498352,\"y\":0.6270310878753662,\"z\":0.08052921295166016,\"visibility\":0},{\"x\":0.8781704306602478,\"y\":0.5951124429702759,\"z\":0.06988411396741867,\"visibility\":0},{\"x\":0.9020516872406006,\"y\":0.7279819846153259,\"z\":0.046539608389139175,\"visibility\":0},{\"x\":0.9133985042572021,\"y\":0.6810554265975952,\"z\":0.055541250854730606,\"visibility\":0},{\"x\":0.9148292541503906,\"y\":0.6508065462112427,\"z\":0.05386044830083847,\"visibility\":0},{\"x\":0.9161320924758911,\"y\":0.621634304523468,\"z\":0.0499759167432785,\"visibility\":0}]]"
}
Message name:
Event listener example
window.addEventListener("face-landmarks", function (event) {
//Do Something
});
Event data
Object
{
"faceLandmarkerResult": {
"faceBlendshapes": [
{
"categories": [
{
"categoryName": "_neutral",
"displayName": "",
"index": 0,
"score": 0.000002109876049871673
},
{
"categoryName": "browDownLeft",
"displayName": "",
"index": 1,
"score": 0.022858552634716034
},
{
"categoryName": "browDownRight",
"displayName": "",
"index": 2,
"score": 0.012267393060028553
},
{
"categoryName": "browInnerUp",
"displayName": "",
"index": 3,
"score": 0.02032194845378399
},
{
"categoryName": "browOuterUpLeft",
"displayName": "",
"index": 4,
"score": 0.10230637341737747
},
{
"categoryName": "browOuterUpRight",
"displayName": "",
"index": 5,
"score": 0.11128177493810654
},
{
"categoryName": "cheekPuff",
"displayName": "",
"index": 6,
"score": 0.000046493329136865214
},
{
"categoryName": "cheekSquintLeft",
"displayName": "",
"index": 7,
"score": 5.526308655134926e-8
},
{
"categoryName": "cheekSquintRight",
"displayName": "",
"index": 8,
"score": 3.7752269577140396e-7
},
{
"categoryName": "eyeBlinkLeft",
"displayName": "",
"index": 9,
"score": 0.15118007361888885
},
{
"categoryName": "eyeBlinkRight",
"displayName": "",
"index": 10,
"score": 0.19274845719337463
},
{
"categoryName": "eyeLookDownLeft",
"displayName": "",
"index": 11,
"score": 0.12557381391525269
},
{
"categoryName": "eyeLookDownRight",
"displayName": "",
"index": 12,
"score": 0.16864973306655884
},
{
"categoryName": "eyeLookInLeft",
"displayName": "",
"index": 13,
"score": 0.004771905019879341
},
{
"categoryName": "eyeLookInRight",
"displayName": "",
"index": 14,
"score": 0.32876813411712646
},
{
"categoryName": "eyeLookOutLeft",
"displayName": "",
"index": 15,
"score": 0.34337201714515686
},
{
"categoryName": "eyeLookOutRight",
"displayName": "",
"index": 16,
"score": 0.020578913390636444
},
{
"categoryName": "eyeLookUpLeft",
"displayName": "",
"index": 17,
"score": 0.0757729560136795
},
{
"categoryName": "eyeLookUpRight",
"displayName": "",
"index": 18,
"score": 0.05410344526171684
},
{
"categoryName": "eyeSquintLeft",
"displayName": "",
"index": 19,
"score": 0.41864877939224243
},
{
"categoryName": "eyeSquintRight",
"displayName": "",
"index": 20,
"score": 0.3198547065258026
},
{
"categoryName": "eyeWideLeft",
"displayName": "",
"index": 21,
"score": 0.00384839647449553
},
{
"categoryName": "eyeWideRight",
"displayName": "",
"index": 22,
"score": 0.0023359579499810934
},
{
"categoryName": "jawForward",
"displayName": "",
"index": 23,
"score": 0.00011548112524906173
},
{
"categoryName": "jawLeft",
"displayName": "",
"index": 24,
"score": 0.0018875936511904001
},
{
"categoryName": "jawOpen",
"displayName": "",
"index": 25,
"score": 0.01698133535683155
},
{
"categoryName": "jawRight",
"displayName": "",
"index": 26,
"score": 0.000013287482033774722
},
{
"categoryName": "mouthClose",
"displayName": "",
"index": 27,
"score": 0.011766532436013222
},
{
"categoryName": "mouthDimpleLeft",
"displayName": "",
"index": 28,
"score": 0.03745143115520477
},
{
"categoryName": "mouthDimpleRight",
"displayName": "",
"index": 29,
"score": 0.014166870154440403
},
{
"categoryName": "mouthFrownLeft",
"displayName": "",
"index": 30,
"score": 0.000045591590605909005
},
{
"categoryName": "mouthFrownRight",
"displayName": "",
"index": 31,
"score": 0.00006781705451430753
},
{
"categoryName": "mouthFunnel",
"displayName": "",
"index": 32,
"score": 0.00022527173859998584
},
{
"categoryName": "mouthLeft",
"displayName": "",
"index": 33,
"score": 0.009081831201910973
},
{
"categoryName": "mouthLowerDownLeft",
"displayName": "",
"index": 34,
"score": 0.00025259150424972177
},
{
"categoryName": "mouthLowerDownRight",
"displayName": "",
"index": 35,
"score": 0.00023906173009891063
},
{
"categoryName": "mouthPressLeft",
"displayName": "",
"index": 36,
"score": 0.21130135655403137
},
{
"categoryName": "mouthPressRight",
"displayName": "",
"index": 37,
"score": 0.19805218279361725
},
{
"categoryName": "mouthPucker",
"displayName": "",
"index": 38,
"score": 0.0006105230422690511
},
{
"categoryName": "mouthRight",
"displayName": "",
"index": 39,
"score": 0.00007372746040346101
},
{
"categoryName": "mouthRollLower",
"displayName": "",
"index": 40,
"score": 0.01968102902173996
},
{
"categoryName": "mouthRollUpper",
"displayName": "",
"index": 41,
"score": 0.036269523203372955
},
{
"categoryName": "mouthShrugLower",
"displayName": "",
"index": 42,
"score": 0.006083718501031399
},
{
"categoryName": "mouthShrugUpper",
"displayName": "",
"index": 43,
"score": 0.0012678722850978374
},
{
"categoryName": "mouthSmileLeft",
"displayName": "",
"index": 44,
"score": 0.02724362537264824
},
{
"categoryName": "mouthSmileRight",
"displayName": "",
"index": 45,
"score": 0.032555971294641495
},
{
"categoryName": "mouthStretchLeft",
"displayName": "",
"index": 46,
"score": 0.0018876483663916588
},
{
"categoryName": "mouthStretchRight",
"displayName": "",
"index": 47,
"score": 0.011493757367134094
},
{
"categoryName": "mouthUpperUpLeft",
"displayName": "",
"index": 48,
"score": 0.00006508542719529942
},
{
"categoryName": "mouthUpperUpRight",
"displayName": "",
"index": 49,
"score": 0.00009459078864892945
},
{
"categoryName": "noseSneerLeft",
"displayName": "",
"index": 50,
"score": 3.0407329631998437e-7
},
{
"categoryName": "noseSneerRight",
"displayName": "",
"index": 51,
"score": 7.242805963869614e-7
}
],
"headIndex": -1,
"headName": ""
}
],
"faceLandmarks": [],
"facialTransformationMatrixes": []
}
}
Message name:
Event listener example
window.addEventListener("message", function (event) {
//Do Something
});
Event data
Object
{
}
After starting Speech Recognition the MediaPipe Text Sentiment can be used.
The Text Sentiment function is triggered by changes to the OBS Text Source created for Speech Recognition.
sequenceDiagram
GUM->>GUM: Start Speech Recognition
GUM->>+OBS: Start Speech Recognition event
OBS->>OBS: Text Source update
OBS->>GUM: Text Source changed event
GUM->>GUM: Text Sentiment Analysis
GUM->>OBS: Text Sentiment Result event
OBS->>OBS: Browser Source
Message name: sentimentResult
Event listener example
window.addEventListener("message", function (event) {
//Do Something
});
Event data
Object
{
}
Message name:
Event listener example
window.addEventListener("message", function (event) {
//Do Something
});
Event data
Object
{
}
Message name:
Event listener example
window.addEventListener("message", function (event) {
//Do Something
});
Event data
Object
{
}
The Web Speech API is used to convert microphone input to text.
The output text is sent to 2 OBS sources.
- an event message is sent to the OBS Browser
- An OBS Text Source
- In OBS, create a Text source to receive the output
- In GUM for OBS, enter the Text source name
Example use: lib/speechBGcolorChange.html
This event is sent after clicking the "Start Speech Recognition" button Message name: speechRecognition-started Event listener example
window.addEventListener("speechRecognition-started", function (event) {
//Do Something
});
Event data
Object
{
"obs_textSourceName": "speechRecognition"
}
Message name: speechRecognition
Event listener example
window.addEventListener("speechRecognition", function (event) {
//Do Something
});
Event data
Object
{
}