[{"data":1,"prerenderedAt":1684},["ShallowReactive",2],{"author-articles-ajoy-gonsalves":3},[4],{"id":5,"title":6,"authorId":7,"body":8,"category":1639,"created":1640,"description":1641,"extension":1642,"faqs":1643,"featurePriority":1653,"head":1654,"landingPath":1654,"meta":1655,"navigation":1657,"ogImage":1654,"path":1672,"robots":1654,"schemaOrg":1654,"seo":1673,"sitemap":1674,"stem":1675,"tags":1676,"__hash__":1683},"blog/blog/1037.building-a-voice-agent-with-vapi-and-webfuse.md","Building a Voice Agent with Vapi and Webfuse","ajoy-gonsalves",{"type":9,"value":10,"toc":1624},"minimark",[11,28,31,49,55,93,104,109,116,125,129,165,172,209,213,263,267,278,307,310,313,317,320,327,332,335,339,418,432,436,439,741,752,758,867,870,874,889,892,1017,1026,1170,1178,1181,1185,1280,1284,1291,1294,1381,1387,1391,1400,1491,1504,1506,1510,1513,1556,1563,1567,1615,1620],[12,13,14,15,20,21,27],"p",{},"Voice agents that can control a live website represent a new frontier in human-computer interaction. Instead of clicking through menus and forms, a user simply speaks - and the agent navigates, clicks, types, and reads on their behalf. In a ",[16,17,19],"a",{"href":18},"/blog/building-a-website-controlling-voice-agent-with-elevenlabs-and-webfuse","previous tutorial",", we explored how to achieve this with ElevenLabs. In this guide, we take the same concept and build it with ",[16,22,26],{"href":23,"rel":24},"https://vapi.ai",[25],"nofollow","Vapi",", a developer-focused voice AI platform that supports client-side tool calling out of the box.",[12,29,30],{},"By the end of this tutorial, you will have a working Webfuse Extension that:",[32,33,34,38,41],"ul",{},[35,36,37],"li",{},"Renders a floating voice orb on any proxied website",[35,39,40],{},"Connects to a Vapi-powered voice assistant on click",[35,42,43,44],{},"Gives the assistant the ability to read the page, click elements, type into fields, press keys, and navigate - all through the ",[16,45,48],{"href":46,"rel":47},"https://dev.webfu.se/automation-api",[25],"Webfuse Automation API",[12,50,51],{},[52,53,54],"strong",{},"What You Will Need:",[32,56,57,72,83],{},[35,58,59,62,63,67,68,71],{},[52,60,61],{},"A Vapi Account:"," Free tier available at ",[16,64,66],{"href":23,"rel":65},[25],"vapi.ai",". You will need your ",[52,69,70],{},"Public Key",".",[35,73,74,77,78,71],{},[52,75,76],{},"A Webfuse Account:"," Required to create a Space and deploy the extension. Sign up at ",[16,79,82],{"href":80,"rel":81},"https://webfuse.com",[25],"webfuse.com",[35,84,85,88,89,92],{},[52,86,87],{},"Node.js 18+"," and ",[52,90,91],{},"pnpm"," (or npm/yarn) installed locally.",[12,94,95,98,99,71],{},[52,96,97],{},"The Full Source Code"," is available on ",[16,100,103],{"href":101,"rel":102},"https://github.com/webfuse-com/extension-vapi-voice-agent",[25],"GitHub",[105,106,108],"h2",{"id":107},"why-vapi","Why Vapi?",[12,110,111,112,115],{},"Vapi is a voice AI platform built for developers. Unlike platforms that abstract away the model layer, Vapi gives you full control over the LLM provider, system prompt, voice, transcriber, and - critically - ",[52,113,114],{},"client-side tools",". This means the voice agent's tool calls can be handled entirely within the browser, with no server-side relay required.",[12,117,118,119,124],{},"For our use case, this is ideal. The Webfuse Automation API is a client-side API available within ",[16,120,123],{"href":121,"rel":122},"https://dev.webfu.se/extension-guide",[25],"Session Extensions",". When the voice agent decides to click a button, it issues a tool call that executes directly in the browser session - no round-trip to a server. No assistant needs to be pre-configured in the Vapi Dashboard - the model, prompt, voice, and tools are all defined inline in code.",[105,126,128],{"id":127},"step-1-clone-install-and-build","Step 1: Clone, Install, and Build",[130,131,136],"pre",{"className":132,"code":133,"language":134,"meta":135,"style":135},"language-console shiki shiki-themes catppuccin-latte night-owl","git clone https://github.com/webfuse-com/extension-vapi-voice-agent.git\ncd extension-vapi-voice-agent\npnpm install\npnpm build\n","console","",[137,138,139,147,153,159],"code",{"__ignoreMap":135},[140,141,144],"span",{"class":142,"line":143},"line",1,[140,145,146],{},"git clone https://github.com/webfuse-com/extension-vapi-voice-agent.git\n",[140,148,150],{"class":142,"line":149},2,[140,151,152],{},"cd extension-vapi-voice-agent\n",[140,154,156],{"class":142,"line":155},3,[140,157,158],{},"pnpm install\n",[140,160,162],{"class":142,"line":161},4,[140,163,164],{},"pnpm build\n",[12,166,167,168,171],{},"That's it. The build produces a ",[137,169,170],{},"dist/"," folder with everything Webfuse needs:",[130,173,175],{"className":132,"code":174,"language":134,"meta":135,"style":135},"dist/\n  background.js    # Auto-opens the popup on session start\n  content.js       # Automation API relay\n  popup.html       # Orb widget UI\n  popup.js         # Vapi SDK + tools (bundled)\n  manifest.json    # Extension manifest\n",[137,176,177,182,187,192,197,203],{"__ignoreMap":135},[140,178,179],{"class":142,"line":143},[140,180,181],{},"dist/\n",[140,183,184],{"class":142,"line":149},[140,185,186],{},"  background.js    # Auto-opens the popup on session start\n",[140,188,189],{"class":142,"line":155},[140,190,191],{},"  content.js       # Automation API relay\n",[140,193,194],{"class":142,"line":161},[140,195,196],{},"  popup.html       # Orb widget UI\n",[140,198,200],{"class":142,"line":199},5,[140,201,202],{},"  popup.js         # Vapi SDK + tools (bundled)\n",[140,204,206],{"class":142,"line":205},6,[140,207,208],{},"  manifest.json    # Extension manifest\n",[105,210,212],{"id":211},"step-2-deploy-to-webfuse","Step 2: Deploy to Webfuse",[214,215,216,232,250],"ol",{},[35,217,218,219,224,225],{},"Go to ",[16,220,223],{"href":221,"rel":222},"https://webfuse.com/studio",[25],"Webfuse Studio"," and create a Space (Solo is perfect for this use-case)",[226,227],"nuxt-picture",{"alt":228,"loading":229,"src":230,"format":231},"Create a Webfuse SPACE","lazy","/blog/building-a-voice-agent-with-vapi-and-webfuse/1.png","png",[35,233,234,235,238,239,242,243,246],{},"In the Space, open ",[52,236,237],{},"Settings"," (gear icon) > ",[52,240,241],{},"Extensions"," > ",[52,244,245],{},"Install extension",[226,247],{"alt":248,"loading":229,"src":249,"format":231},"Install extension on Webfuse","/blog/building-a-voice-agent-with-vapi-and-webfuse/2.png",[35,251,252,253,256,257,259,260],{},"Click ",[52,254,255],{},"Load unpacked in Default Storage"," and select the ",[137,258,170],{}," folder",[226,261],{"alt":248,"loading":229,"src":262,"format":231},"/blog/building-a-voice-agent-with-vapi-and-webfuse/3.png",[105,264,266],{"id":265},"step-3-configure-your-api-key","Step 3: Configure Your API Key",[12,268,269,270,273,274,277],{},"You can set your ",[137,271,272],{},"VAPI_PUBLIC_KEY"," either in the ",[137,275,276],{},"manifest.json"," before building, or directly in Webfuse Studio after uploading:",[214,279,280,286,295],{},[35,281,282,283],{},"In the Extensions panel, click on ",[52,284,285],{},"Vapi Voice Widget",[35,287,252,288,291,292],{},[52,289,290],{},"Configure"," next to Environment Variables",[226,293],{"alt":248,"loading":229,"src":294,"format":231},"/blog/building-a-voice-agent-with-vapi-and-webfuse/4.png",[35,296,297,298,300,301,306],{},"Set ",[137,299,272],{}," to your Vapi public key (found in the ",[16,302,305],{"href":303,"rel":304},"https://dashboard.vapi.ai",[25],"Vapi Dashboard",")",[12,308,309],{},"Open a Session in your Space. The orb appears automatically. Click it, grant microphone access, and start talking.",[311,312],"hr",{},[105,314,316],{"id":315},"how-it-works","How It Works",[12,318,319],{},"With the extension running, let's look under the hood at how the pieces fit together.",[226,321],{"alt":322,"loading":229,"src":323,":height":324,":width":325,"provider":326},"Diagram showing how the Vapi voice extension fits together with Webfuse","/blog/building-a-voice-agent-with-vapi-and-webfuse/5.svg","450","800","none",[328,329,331],"h3",{"id":330},"architecture","Architecture",[12,333,334],{},"The extension is structured around three components, each running in a distinct context within Webfuse:",[226,336],{"alt":337,"loading":229,"src":338,":height":324,":width":325,"provider":326},"Architecture diagram of the three extension components - popup, content script, and background worker","/blog/building-a-voice-agent-with-vapi-and-webfuse/6.svg",[340,341,342,346],"details",{},[343,344,345],"summary",{},"View table data",[347,348,349,365],"table",{},[350,351,352],"thead",{},[353,354,355,359,362],"tr",{},[356,357,358],"th",{},"Component",[356,360,361],{},"Context",[356,363,364],{},"Role",[366,367,368,386,402],"tbody",{},[353,369,370,380,383],{},[371,372,373,376,377,306],"td",{},[52,374,375],{},"Popup"," (",[137,378,379],{},"popup.ts",[371,381,382],{},"Extension page",[371,384,385],{},"Runs the Vapi SDK, handles voice + audio, renders the orb UI, processes tool calls",[353,387,388,396,399],{},[371,389,390,376,393,306],{},[52,391,392],{},"Content",[137,394,395],{},"content.ts",[371,397,398],{},"Tab page",[371,400,401],{},"Thin automation relay - receives tool call messages and executes them via the Webfuse Automation API",[353,403,404,412,415],{},[371,405,406,376,409,306],{},[52,407,408],{},"Background",[137,410,411],{},"background.ts",[371,413,414],{},"Service worker",[371,416,417],{},"Auto-opens the popup when the session starts",[419,420,421],"blockquote",{},[12,422,423,424,427,428,71],{},"For a detailed investigation into why this architecture is necessary, see the ",[137,425,426],{},"REPORT.md"," file in the ",[16,429,431],{"href":101,"rel":430},[25],"github repository",[328,433,435],{"id":434},"the-voice-connection","The Voice Connection",[12,437,438],{},"The popup initializes the Vapi SDK and starts a call with a fully inline assistant configuration:",[130,440,444],{"className":441,"code":442,"language":443,"meta":135,"style":135},"language-typescript shiki shiki-themes catppuccin-latte night-owl","vapi.start({\n  model: {\n    provider: \"openai\",\n    model: \"gpt-4o\",\n    messages: [{ role: \"system\", content: systemPrompt }],\n    tools: vapiTools,\n  },\n  transcriber: { provider: \"deepgram\", model: \"nova-2\", language: \"en\" },\n  voice: { provider: \"vapi\", voiceId: \"Elliot\" },\n  name: \"Webfuse Assistant\",\n  firstMessage: \"Hey! I can help you interact with this page. What would you like to do?\",\n  clientMessages: [\"tool-calls\", \"transcript\"],\n});\n","typescript",[137,445,446,466,478,499,515,559,571,577,631,667,684,701,731],{"__ignoreMap":135},[140,447,448,452,455,459,462],{"class":142,"line":143},[140,449,451],{"class":450},"s2kId","vapi",[140,453,71],{"class":454},"s5FwJ",[140,456,458],{"class":457},"sNstc","start",[140,460,461],{"class":450},"(",[140,463,465],{"class":464},"scGhl","{\n",[140,467,468,471,475],{"class":142,"line":149},[140,469,470],{"class":450},"  model",[140,472,474],{"class":473},"sVS64",":",[140,476,477],{"class":464}," {\n",[140,479,480,483,485,489,493,496],{"class":142,"line":155},[140,481,482],{"class":450},"    provider",[140,484,474],{"class":473},[140,486,488],{"class":487},"sbuKk"," \"",[140,490,492],{"class":491},"sfrMT","openai",[140,494,495],{"class":487},"\"",[140,497,498],{"class":464},",\n",[140,500,501,504,506,508,511,513],{"class":142,"line":161},[140,502,503],{"class":450},"    model",[140,505,474],{"class":473},[140,507,488],{"class":487},[140,509,510],{"class":491},"gpt-4o",[140,512,495],{"class":487},[140,514,498],{"class":464},[140,516,517,520,522,525,528,531,533,535,538,540,543,546,548,551,554,557],{"class":142,"line":199},[140,518,519],{"class":450},"    messages",[140,521,474],{"class":473},[140,523,524],{"class":450}," [",[140,526,527],{"class":464},"{",[140,529,530],{"class":450}," role",[140,532,474],{"class":473},[140,534,488],{"class":487},[140,536,537],{"class":491},"system",[140,539,495],{"class":487},[140,541,542],{"class":464},",",[140,544,545],{"class":450}," content",[140,547,474],{"class":473},[140,549,550],{"class":450}," systemPrompt ",[140,552,553],{"class":464},"}",[140,555,556],{"class":450},"]",[140,558,498],{"class":464},[140,560,561,564,566,569],{"class":142,"line":205},[140,562,563],{"class":450},"    tools",[140,565,474],{"class":473},[140,567,568],{"class":450}," vapiTools",[140,570,498],{"class":464},[140,572,574],{"class":142,"line":573},7,[140,575,576],{"class":464},"  },\n",[140,578,580,583,585,588,591,593,595,598,600,602,605,607,609,612,614,616,619,621,623,626,628],{"class":142,"line":579},8,[140,581,582],{"class":450},"  transcriber",[140,584,474],{"class":473},[140,586,587],{"class":464}," {",[140,589,590],{"class":450}," provider",[140,592,474],{"class":473},[140,594,488],{"class":487},[140,596,597],{"class":491},"deepgram",[140,599,495],{"class":487},[140,601,542],{"class":464},[140,603,604],{"class":450}," model",[140,606,474],{"class":473},[140,608,488],{"class":487},[140,610,611],{"class":491},"nova-2",[140,613,495],{"class":487},[140,615,542],{"class":464},[140,617,618],{"class":450}," language",[140,620,474],{"class":473},[140,622,488],{"class":487},[140,624,625],{"class":491},"en",[140,627,495],{"class":487},[140,629,630],{"class":464}," },\n",[140,632,634,637,639,641,643,645,647,649,651,653,656,658,660,663,665],{"class":142,"line":633},9,[140,635,636],{"class":450},"  voice",[140,638,474],{"class":473},[140,640,587],{"class":464},[140,642,590],{"class":450},[140,644,474],{"class":473},[140,646,488],{"class":487},[140,648,451],{"class":491},[140,650,495],{"class":487},[140,652,542],{"class":464},[140,654,655],{"class":450}," voiceId",[140,657,474],{"class":473},[140,659,488],{"class":487},[140,661,662],{"class":491},"Elliot",[140,664,495],{"class":487},[140,666,630],{"class":464},[140,668,670,673,675,677,680,682],{"class":142,"line":669},10,[140,671,672],{"class":450},"  name",[140,674,474],{"class":473},[140,676,488],{"class":487},[140,678,679],{"class":491},"Webfuse Assistant",[140,681,495],{"class":487},[140,683,498],{"class":464},[140,685,687,690,692,694,697,699],{"class":142,"line":686},11,[140,688,689],{"class":450},"  firstMessage",[140,691,474],{"class":473},[140,693,488],{"class":487},[140,695,696],{"class":491},"Hey! I can help you interact with this page. What would you like to do?",[140,698,495],{"class":487},[140,700,498],{"class":464},[140,702,704,707,709,711,713,716,718,720,722,725,727,729],{"class":142,"line":703},12,[140,705,706],{"class":450},"  clientMessages",[140,708,474],{"class":473},[140,710,524],{"class":450},[140,712,495],{"class":487},[140,714,715],{"class":491},"tool-calls",[140,717,495],{"class":487},[140,719,542],{"class":464},[140,721,488],{"class":487},[140,723,724],{"class":491},"transcript",[140,726,495],{"class":487},[140,728,556],{"class":450},[140,730,498],{"class":464},[140,732,734,736,738],{"class":142,"line":733},13,[140,735,553],{"class":464},[140,737,306],{"class":450},[140,739,740],{"class":464},";\n",[12,742,743,744,747,748,751],{},"The critical field is ",[137,745,746],{},"clientMessages: [\"tool-calls\", \"transcript\"]",". This tells Vapi to deliver tool call events to the client SDK rather than routing them server-side. When the model decides to call ",[137,749,750],{},"click_element",", the event arrives as a message in the popup, where our handler executes it locally.",[12,753,754,755,474],{},"When a tool call completes, the result is injected back into the conversation as a system message using ",[137,756,757],{},"vapi.send()",[130,759,761],{"className":441,"code":760,"language":443,"meta":135,"style":135},"vapi?.send({\n  type: \"add-message\",\n  message: {\n    role: \"system\",\n    content: `[Tool \"${name}\" result]: ${resultStr}`,\n  },\n});\n",[137,762,763,777,793,802,817,855,859],{"__ignoreMap":135},[140,764,765,767,770,773,775],{"class":142,"line":143},[140,766,451],{"class":450},[140,768,769],{"class":454},"?.",[140,771,772],{"class":457},"send",[140,774,461],{"class":450},[140,776,465],{"class":464},[140,778,779,782,784,786,789,791],{"class":142,"line":149},[140,780,781],{"class":450},"  type",[140,783,474],{"class":473},[140,785,488],{"class":487},[140,787,788],{"class":491},"add-message",[140,790,495],{"class":487},[140,792,498],{"class":464},[140,794,795,798,800],{"class":142,"line":155},[140,796,797],{"class":450},"  message",[140,799,474],{"class":473},[140,801,477],{"class":464},[140,803,804,807,809,811,813,815],{"class":142,"line":161},[140,805,806],{"class":450},"    role",[140,808,474],{"class":473},[140,810,488],{"class":487},[140,812,537],{"class":491},[140,814,495],{"class":487},[140,816,498],{"class":464},[140,818,819,822,824,828,831,835,838,840,843,845,848,850,853],{"class":142,"line":199},[140,820,821],{"class":450},"    content",[140,823,474],{"class":473},[140,825,827],{"class":826},"sizNf"," `",[140,829,830],{"class":491},"[Tool \"",[140,832,834],{"class":833},"sDF9U","${",[140,836,837],{"class":450},"name",[140,839,553],{"class":833},[140,841,842],{"class":491},"\" result]: ",[140,844,834],{"class":833},[140,846,847],{"class":450},"resultStr",[140,849,553],{"class":833},[140,851,852],{"class":826},"`",[140,854,498],{"class":464},[140,856,857],{"class":142,"line":205},[140,858,576],{"class":464},[140,860,861,863,865],{"class":142,"line":573},[140,862,553],{"class":464},[140,864,306],{"class":450},[140,866,740],{"class":464},[12,868,869],{},"This ensures the model can read the output of its own tool calls - for example, after taking a DOM snapshot, the model receives the HTML and can describe what it sees or decide which element to target next.",[328,871,873],{"id":872},"automation-tools","Automation Tools",[12,875,876,877,880,881,884,885,888],{},"The file ",[137,878,879],{},"src/tools.ts"," defines the bridge between Vapi's tool calling system and the Webfuse Automation API. Each tool has a ",[52,882,883],{},"handler"," (the function that executes) and a ",[52,886,887],{},"definition"," (the schema Vapi sends to the LLM).",[12,890,891],{},"Here is the tool handler for clicking an element:",[130,893,895],{"className":441,"code":894,"language":443,"meta":135,"style":135},"case \"click_element\": {\n  await delegateAutomation(\"act\", \"click\", params.target, {\n    moveMouse: true,\n    scrollIntoView: true,\n  });\n  return `Clicked \"${params.target}\"`;\n}\n",[137,896,897,914,950,963,974,983,1012],{"__ignoreMap":135},[140,898,899,903,905,907,909,912],{"class":142,"line":143},[140,900,902],{"class":901},"s76yb","case",[140,904,488],{"class":487},[140,906,750],{"class":491},[140,908,495],{"class":487},[140,910,911],{"class":450},": ",[140,913,465],{"class":464},[140,915,916,919,922,925,927,930,932,934,936,939,941,943,946,948],{"class":142,"line":149},[140,917,918],{"class":450},"  await ",[140,920,921],{"class":457},"delegateAutomation",[140,923,461],{"class":924},"sMtgK",[140,926,495],{"class":487},[140,928,929],{"class":491},"act",[140,931,495],{"class":487},[140,933,542],{"class":464},[140,935,488],{"class":487},[140,937,938],{"class":491},"click",[140,940,495],{"class":487},[140,942,542],{"class":464},[140,944,945],{"class":450}," params.target",[140,947,542],{"class":464},[140,949,477],{"class":464},[140,951,952,955,957,961],{"class":142,"line":155},[140,953,954],{"class":450},"    moveMouse",[140,956,474],{"class":464},[140,958,960],{"class":959},"sIhCM"," true",[140,962,498],{"class":464},[140,964,965,968,970,972],{"class":142,"line":161},[140,966,967],{"class":450},"    scrollIntoView",[140,969,474],{"class":464},[140,971,960],{"class":959},[140,973,498],{"class":464},[140,975,976,979,981],{"class":142,"line":199},[140,977,978],{"class":464},"  }",[140,980,306],{"class":924},[140,982,740],{"class":450},[140,984,985,988,990,993,995,998,1000,1004,1006,1008,1010],{"class":142,"line":205},[140,986,987],{"class":450},"  return ",[140,989,852],{"class":826},[140,991,992],{"class":491},"Clicked \"",[140,994,834],{"class":833},[140,996,997],{"class":450},"params",[140,999,71],{"class":454},[140,1001,1003],{"class":1002},"sL4Ga","target",[140,1005,553],{"class":833},[140,1007,495],{"class":491},[140,1009,852],{"class":826},[140,1011,740],{"class":450},[140,1013,1014],{"class":142,"line":573},[140,1015,1016],{"class":450},"}\n",[12,1018,1019,1020,1022,1023,474],{},"The ",[137,1021,921],{}," helper sends a message from the popup to the content script, which calls the corresponding method on ",[137,1024,1025],{},"browser.webfuseSession.automation",[130,1027,1029],{"className":441,"code":1028,"language":443,"meta":135,"style":135},"function delegateAutomation(\n  automationScope: string,\n  automationMethod: string,\n  ...automationArgs: any[]\n): Promise\u003Cany> {\n  return browser.tabs.sendMessage(0, {\n    automationScope,\n    automationMethod,\n    automationArgs,\n  });\n}\n",[137,1030,1031,1042,1056,1067,1084,1106,1137,1144,1151,1158,1166],{"__ignoreMap":135},[140,1032,1033,1036,1039],{"class":142,"line":143},[140,1034,1035],{"class":901},"function",[140,1037,1038],{"class":457}," delegateAutomation",[140,1040,1041],{"class":924},"(\n",[140,1043,1044,1047,1050,1054],{"class":142,"line":149},[140,1045,1046],{"class":959},"  automationScope",[140,1048,474],{"class":1049},"s9rnR",[140,1051,1053],{"class":1052},"scrte"," string",[140,1055,498],{"class":464},[140,1057,1058,1061,1063,1065],{"class":142,"line":155},[140,1059,1060],{"class":959},"  automationMethod",[140,1062,474],{"class":1049},[140,1064,1053],{"class":1052},[140,1066,498],{"class":464},[140,1068,1069,1072,1075,1077,1080],{"class":142,"line":161},[140,1070,1071],{"class":1049},"  ...",[140,1073,1074],{"class":959},"automationArgs",[140,1076,474],{"class":1049},[140,1078,1079],{"class":1052}," any",[140,1081,1083],{"class":1082},"sXbZB","[]\n",[140,1085,1086,1088,1090,1094,1098,1101,1104],{"class":142,"line":199},[140,1087,306],{"class":924},[140,1089,474],{"class":1049},[140,1091,1093],{"class":1092},"s-DR7"," Promise",[140,1095,1097],{"class":1096},"s0xQc","\u003C",[140,1099,1100],{"class":1052},"any",[140,1102,1103],{"class":1096},">",[140,1105,477],{"class":464},[140,1107,1108,1112,1115,1117,1121,1123,1126,1128,1132,1135],{"class":142,"line":205},[140,1109,1111],{"class":1110},"srhcd","  return",[140,1113,1114],{"class":450}," browser",[140,1116,71],{"class":454},[140,1118,1120],{"class":1119},"sHY1S","tabs",[140,1122,71],{"class":454},[140,1124,1125],{"class":457},"sendMessage",[140,1127,461],{"class":450},[140,1129,1131],{"class":1130},"sZ_Zo","0",[140,1133,542],{"class":1134},"sdjIP",[140,1136,477],{"class":464},[140,1138,1139,1142],{"class":142,"line":573},[140,1140,1141],{"class":450},"    automationScope",[140,1143,498],{"class":1134},[140,1145,1146,1149],{"class":142,"line":579},[140,1147,1148],{"class":450},"    automationMethod",[140,1150,498],{"class":1134},[140,1152,1153,1156],{"class":142,"line":633},[140,1154,1155],{"class":450},"    automationArgs",[140,1157,498],{"class":1134},[140,1159,1160,1162,1164],{"class":142,"line":669},[140,1161,978],{"class":464},[140,1163,306],{"class":450},[140,1165,740],{"class":464},[140,1167,1168],{"class":142,"line":686},[140,1169,1016],{"class":464},[12,1171,1172,1173,1177],{},"This delegation is necessary because the ",[16,1174,1176],{"href":46,"rel":1175},[25],"Automation API"," is only available in content scripts - it operates on the live page DOM.",[12,1179,1180],{},"The full set of tools:",[226,1182],{"alt":1183,"loading":229,"src":1184,":height":324,":width":325,"provider":326},"Table of automation tools available to the voice agent - dom snapshot, click, type, key press, navigate","/blog/building-a-voice-agent-with-vapi-and-webfuse/7.svg",[340,1186,1187,1189],{},[343,1188,345],{},[347,1190,1191,1204],{},[350,1192,1193],{},[353,1194,1195,1198,1201],{},[356,1196,1197],{},"Tool",[356,1199,1200],{},"Automation Method",[356,1202,1203],{},"Purpose",[366,1205,1206,1221,1235,1250,1265],{},[353,1207,1208,1213,1218],{},[371,1209,1210],{},[137,1211,1212],{},"take_dom_snapshot",[371,1214,1215],{},[137,1216,1217],{},"see.domSnapshot()",[371,1219,1220],{},"Read the page structure with Webfuse IDs",[353,1222,1223,1227,1232],{},[371,1224,1225],{},[137,1226,750],{},[371,1228,1229],{},[137,1230,1231],{},"act.click()",[371,1233,1234],{},"Click buttons, links, any element",[353,1236,1237,1242,1247],{},[371,1238,1239],{},[137,1240,1241],{},"type_text",[371,1243,1244],{},[137,1245,1246],{},"act.type()",[371,1248,1249],{},"Type into input fields",[353,1251,1252,1257,1262],{},[371,1253,1254],{},[137,1255,1256],{},"press_key",[371,1258,1259],{},[137,1260,1261],{},"act.keyPress()",[371,1263,1264],{},"Press Enter, Escape, Tab, arrow keys",[353,1266,1267,1272,1277],{},[371,1268,1269],{},[137,1270,1271],{},"navigate_to",[371,1273,1274],{},[137,1275,1276],{},"automation.navigate()",[371,1278,1279],{},"Go to a different URL",[328,1281,1283],{"id":1282},"the-orb-ui","The Orb UI",[12,1285,1286,1287,1290],{},"The floating orb widget is defined in ",[137,1288,1289],{},"popup.html",". It uses CSS ping-ring animations to communicate the current call state visually:",[12,1292,1293],{},"These states are driven by CSS classes toggled from the popup script in response to Vapi events:",[130,1295,1297],{"className":441,"code":1296,"language":443,"meta":135,"style":135},"vapi.on(\"speech-start\", () => setState(\"ai-speaking\"));\nvapi.on(\"speech-end\", () => setState(\"connected\"));\n",[137,1298,1299,1343],{"__ignoreMap":135},[140,1300,1301,1303,1305,1308,1310,1312,1315,1317,1319,1322,1326,1329,1331,1333,1336,1338,1341],{"class":142,"line":143},[140,1302,451],{"class":450},[140,1304,71],{"class":454},[140,1306,1307],{"class":457},"on",[140,1309,461],{"class":450},[140,1311,495],{"class":487},[140,1313,1314],{"class":491},"speech-start",[140,1316,495],{"class":487},[140,1318,542],{"class":464},[140,1320,1321],{"class":924}," ()",[140,1323,1325],{"class":1324},"s-_ek"," =>",[140,1327,1328],{"class":457}," setState",[140,1330,461],{"class":450},[140,1332,495],{"class":487},[140,1334,1335],{"class":491},"ai-speaking",[140,1337,495],{"class":487},[140,1339,1340],{"class":450},"))",[140,1342,740],{"class":464},[140,1344,1345,1347,1349,1351,1353,1355,1358,1360,1362,1364,1366,1368,1370,1372,1375,1377,1379],{"class":142,"line":149},[140,1346,451],{"class":450},[140,1348,71],{"class":454},[140,1350,1307],{"class":457},[140,1352,461],{"class":450},[140,1354,495],{"class":487},[140,1356,1357],{"class":491},"speech-end",[140,1359,495],{"class":487},[140,1361,542],{"class":464},[140,1363,1321],{"class":924},[140,1365,1325],{"class":1324},[140,1367,1328],{"class":457},[140,1369,461],{"class":450},[140,1371,495],{"class":487},[140,1373,1374],{"class":491},"connected",[140,1376,495],{"class":487},[140,1378,1340],{"class":450},[140,1380,740],{"class":464},[12,1382,1383,1384,1386],{},"The orb also includes error handling. If a user clicks the orb without configuring their ",[137,1385,272],{},", a toast notification slides in with a clear message guiding them to the extension settings.",[328,1388,1390],{"id":1389},"the-manifest","The Manifest",[12,1392,1019,1393,1395,1396,1399],{},[137,1394,276],{}," includes ",[137,1397,1398],{},"host_permissions"," for all domains the Vapi SDK communicates with:",[130,1401,1405],{"className":1402,"code":1403,"language":1404,"meta":135,"style":135},"language-json shiki shiki-themes catppuccin-latte night-owl","\"host_permissions\": [\n    \"https://cdn.jsdelivr.net/*\",\n    \"https://api.vapi.ai/*\",\n    \"https://*.daily.co/*\",\n    \"https://c.daily.co/*\",\n    \"wss://*.daily.co/*\",\n    \"https://*.ingest.sentry.io/*\"\n]\n","json",[137,1406,1407,1420,1432,1443,1454,1465,1476,1486],{"__ignoreMap":135},[140,1408,1409,1411,1413,1415,1417],{"class":142,"line":143},[140,1410,495],{"class":487},[140,1412,1398],{"class":491},[140,1414,495],{"class":487},[140,1416,911],{"class":450},[140,1418,1419],{"class":464},"[\n",[140,1421,1422,1425,1428,1430],{"class":142,"line":149},[140,1423,1424],{"class":487},"    \"",[140,1426,1427],{"class":491},"https://cdn.jsdelivr.net/*",[140,1429,495],{"class":487},[140,1431,498],{"class":464},[140,1433,1434,1436,1439,1441],{"class":142,"line":155},[140,1435,1424],{"class":487},[140,1437,1438],{"class":491},"https://api.vapi.ai/*",[140,1440,495],{"class":487},[140,1442,498],{"class":464},[140,1444,1445,1447,1450,1452],{"class":142,"line":161},[140,1446,1424],{"class":487},[140,1448,1449],{"class":491},"https://*.daily.co/*",[140,1451,495],{"class":487},[140,1453,498],{"class":464},[140,1455,1456,1458,1461,1463],{"class":142,"line":199},[140,1457,1424],{"class":487},[140,1459,1460],{"class":491},"https://c.daily.co/*",[140,1462,495],{"class":487},[140,1464,498],{"class":464},[140,1466,1467,1469,1472,1474],{"class":142,"line":205},[140,1468,1424],{"class":487},[140,1470,1471],{"class":491},"wss://*.daily.co/*",[140,1473,495],{"class":487},[140,1475,498],{"class":464},[140,1477,1478,1480,1483],{"class":142,"line":573},[140,1479,1424],{"class":487},[140,1481,1482],{"class":491},"https://*.ingest.sentry.io/*",[140,1484,1485],{"class":487},"\"\n",[140,1487,1488],{"class":142,"line":579},[140,1489,1490],{"class":464},"]\n",[12,1492,1493,1494,1496,1497,1500,1501,1503],{},"Note that ",[137,1495,1471],{}," is listed separately - WebSocket Secure URLs require their own permission. The ",[137,1498,1499],{},"env"," array defines ",[137,1502,272],{},", which can be configured per-Space in Webfuse Studio without rebuilding.",[311,1505],{},[105,1507,1509],{"id":1508},"what-happens-next","What Happens Next",[12,1511,1512],{},"This extension is a starting point. From here, you might consider:",[32,1514,1515,1535,1547],{},[35,1516,1517,1520,1521,1524,1525,1528,1529,1532,1533,71],{},[52,1518,1519],{},"Expanding the toolset."," The Webfuse Automation API offers additional methods like ",[137,1522,1523],{},"act.select()"," for dropdowns, ",[137,1526,1527],{},"act.textSelect()"," for highlighting text, and ",[137,1530,1531],{},"see.guiSnapshot()"," for sending a screenshot to the model. Each can be added as a new tool in ",[137,1534,879],{},[35,1536,1537,1540,1541,1546],{},[52,1538,1539],{},"Adding the Session MCP Server."," For more advanced orchestration, you can connect the ",[16,1542,1545],{"href":1543,"rel":1544},"https://dev.webfu.se/session-mcp-server",[25],"Webfuse Session MCP Server"," to route automation through the Model Context Protocol, enabling multi-agent workflows and external tool registries.",[35,1548,1549,1552,1553,1555],{},[52,1550,1551],{},"Refining the system prompt."," The default prompt in ",[137,1554,879],{}," is intentionally minimal. A production agent would benefit from detailed instructions about how to interpret DOM snapshots, when to use Webfuse IDs versus CSS selectors, and how to handle error recovery.",[12,1557,1558,1559,71],{},"The full source code is available at ",[16,1560,1562],{"href":101,"rel":1561},[25],"github.com/webfuse-com/extension-vapi-voice-agent",[105,1564,1566],{"id":1565},"further-reading","Further Reading",[32,1568,1569,1576,1583,1589,1595,1602,1608],{},[35,1570,1571],{},[16,1572,1575],{"href":1573,"rel":1574},"https://docs.vapi.ai/quickstart/web",[25],"Vapi Web SDK Documentation",[35,1577,1578],{},[16,1579,1582],{"href":1580,"rel":1581},"https://docs.vapi.ai/tools/client-side-websdk",[25],"Vapi Client-Side Tools",[35,1584,1585],{},[16,1586,1588],{"href":46,"rel":1587},[25],"Webfuse Automation API Reference",[35,1590,1591],{},[16,1592,1594],{"href":121,"rel":1593},[25],"Webfuse Extension Guide",[35,1596,1597],{},[16,1598,1601],{"href":1599,"rel":1600},"https://dev.webfu.se/agent-guide",[25],"Webfuse Agent Guide",[35,1603,1604],{},[16,1605,1607],{"href":1543,"rel":1606},[25],"About Webfuse Session MCP",[35,1609,1610],{},[16,1611,1614],{"href":1612,"rel":1613},"https://www.webfuse.com/blog/a-gentle-introduction-to-ai-agents-for-the-web",[25],"A Gentle Introduction to AI Agents for the Web",[1616,1617],"article-signup-cta",{"heading":1618,"subtitle":1619},"Build Your Own Voice-Controlled Web Agent","Webfuse lets you proxy any website and give AI agents full control over it - clicks, typing, navigation, and DOM reading. Sign up and deploy your first voice agent in minutes.",[1621,1622,1623],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s2kId, html code.shiki .s2kId{--shiki-default:#4C4F69;--shiki-dark:#D6DEEB}html pre.shiki code .s5FwJ, html code.shiki .s5FwJ{--shiki-default:#179299;--shiki-default-font-style:inherit;--shiki-dark:#C792EA;--shiki-dark-font-style:italic}html pre.shiki code .sNstc, html code.shiki .sNstc{--shiki-default:#1E66F5;--shiki-default-font-style:italic;--shiki-dark:#82AAFF;--shiki-dark-font-style:italic}html pre.shiki code .scGhl, html code.shiki .scGhl{--shiki-default:#7C7F93;--shiki-dark:#D6DEEB}html pre.shiki code .sVS64, html code.shiki .sVS64{--shiki-default:#179299;--shiki-dark:#D6DEEB}html pre.shiki code .sbuKk, html code.shiki .sbuKk{--shiki-default:#40A02B;--shiki-dark:#D9F5DD}html pre.shiki code .sfrMT, html code.shiki .sfrMT{--shiki-default:#40A02B;--shiki-dark:#ECC48D}html pre.shiki code .sizNf, html code.shiki .sizNf{--shiki-default:#40A02B;--shiki-dark:#D6DEEB}html pre.shiki code .sDF9U, html code.shiki .sDF9U{--shiki-default:#7C7F93;--shiki-dark:#D3423E}html pre.shiki code .s76yb, html code.shiki .s76yb{--shiki-default:#8839EF;--shiki-dark:#C792EA}html pre.shiki code .sMtgK, html code.shiki .sMtgK{--shiki-default:#7C7F93;--shiki-dark:#D9F5DD}html pre.shiki code .sIhCM, html code.shiki .sIhCM{--shiki-default:#E64553;--shiki-default-font-style:italic;--shiki-dark:#D7DBE0;--shiki-dark-font-style:inherit}html pre.shiki code .sL4Ga, html code.shiki .sL4Ga{--shiki-default:#4C4F69;--shiki-dark:#BAEBE2}html pre.shiki code .s9rnR, html code.shiki .s9rnR{--shiki-default:#179299;--shiki-dark:#7FDBCA}html pre.shiki code .scrte, html code.shiki .scrte{--shiki-default:#8839EF;--shiki-dark:#C5E478}html pre.shiki code .sXbZB, html code.shiki .sXbZB{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#D6DEEB;--shiki-dark-font-style:inherit}html pre.shiki code .s-DR7, html code.shiki .s-DR7{--shiki-default:#DF8E1D;--shiki-default-font-style:italic;--shiki-dark:#FFCB8B;--shiki-dark-font-style:inherit}html pre.shiki code .s0xQc, html code.shiki .s0xQc{--shiki-default:#04A5E5;--shiki-dark:#D6DEEB}html pre.shiki code .srhcd, html code.shiki .srhcd{--shiki-default:#8839EF;--shiki-default-font-style:inherit;--shiki-dark:#C792EA;--shiki-dark-font-style:italic}html pre.shiki code .sHY1S, html code.shiki .sHY1S{--shiki-default:#4C4F69;--shiki-default-font-style:inherit;--shiki-dark:#FAF39F;--shiki-dark-font-style:italic}html pre.shiki code .sZ_Zo, html code.shiki .sZ_Zo{--shiki-default:#FE640B;--shiki-dark:#F78C6C}html pre.shiki code .sdjIP, html code.shiki .sdjIP{--shiki-default:#7C7F93;--shiki-dark:#5F7E97}html pre.shiki code .s-_ek, html code.shiki .s-_ek{--shiki-default:#179299;--shiki-dark:#C792EA}",{"title":135,"searchDepth":149,"depth":149,"links":1625},[1626,1627,1628,1629,1630,1637,1638],{"id":107,"depth":149,"text":108},{"id":127,"depth":149,"text":128},{"id":211,"depth":149,"text":212},{"id":265,"depth":149,"text":266},{"id":315,"depth":149,"text":316,"children":1631},[1632,1633,1634,1635,1636],{"id":330,"depth":155,"text":331},{"id":434,"depth":155,"text":435},{"id":872,"depth":155,"text":873},{"id":1282,"depth":155,"text":1283},{"id":1389,"depth":155,"text":1390},{"id":1508,"depth":149,"text":1509},{"id":1565,"depth":149,"text":1566},"voice-ai","2026-03-27","A hands-on guide to building a voice-powered AI assistant that can see, click, type, and navigate any website - using the Vapi Web SDK and the Webfuse Automation API.","md",[1644,1647,1650],{"question":1645,"answer":1646},"Do I need to pre-configure an assistant in the Vapi Dashboard?","No. In this tutorial, the model, prompt, voice, and tools are all defined inline in code - no Vapi Dashboard configuration is required.",{"question":1648,"answer":1649},"Why does the Vapi SDK run in the popup context rather than a content script?","The Vapi SDK requires WebRTC connections to Daily.co for audio transport. In Webfuse, host_permissions are only respected in the popup and background contexts - content scripts inherit the proxied page's Content Security Policy and cannot make those connections.",{"question":1651,"answer":1652},"Can I add more automation tools beyond the five included?","Yes. The Webfuse Automation API supports additional methods like act.select() for dropdowns, act.textSelect() for highlighting text, and see.guiSnapshot() for screenshots. Each can be wired up as a new tool in src/tools.ts.",0,null,{"shortTitle":1656,"homepage":1657,"relatedLinks":1658},"Voice Agent with Vapi & Webfuse",true,[1659,1662,1665,1668,1670],{"text":1660,"href":1661},"Part 1: Agent Web Integration Methods","/blog/voice-agent-web-control-3-methods-compared",{"text":1663,"href":1664},"Part 2: The Technical Architecture","/blog/architecture-of-a-web-controlling-voice-agent",{"text":1666,"href":1667},"Part 3: Perception & Action API Tools","/blog/how-voice-agents-see-and-act-a-guide-to-dom-tools",{"text":1669,"href":18},"ElevenLabs + Webfuse Tutorial",{"text":1671,"href":101},"GitHub Repository","/blog/building-a-voice-agent-with-vapi-and-webfuse",{"title":6,"description":1641},{"loc":1672},"blog/1037.building-a-voice-agent-with-vapi-and-webfuse",[1677,1678,1679,1680,1681,1682],"ai-agents","voice-agents","web-agents","web-automation","browser-agents","voice-agent-series","SOJ95WKEo2t4L66Ua4i3Zw_N0OYlFxrQMGsYXUB-45o",1777376334401]