Module Summary

  • Goal: Create a custom GitHub MCP Server with Agentic RAG optimization
  • Estimated Time: 15-20 minutes
  • Prerequisites: GitHub account, Cursor IDE, xpander.ai account

🚀 In this module, you’ll learn how to build a powerful GitHub search agent with optimized Agentic RAG capabilities and expose it as a Model Context Protocol (MCP) Server to supercharge your Cursor IDE experience. By building this agent, you’ll understand how to create practical AI tools that integrate seamlessly with your development workflow, improving coding efficiency and access to real-time code examples.

🔍 Creating Your GitHub Search Agent

Step 1: Set up xpander.ai

  1. Navigate to the xpander.ai platform:

    Terminal Command
    # Open in your browser
    open https://app.xpander.ai
    
  2. Sign in with your credentials

  3. In the left navigation menu, go to AI Agents

  4. Click the New AI Agent button to open the Workbench

  5. When prompted by the agent builder, click Skip (we’ll create the agent manually)

Step 2: Configure Agent Settings

  1. In the Builder Workbench, click on the Gear Icon in the top-right corner

  2. In the Agent Builder Settings panel, click on Generate Details

  3. Enter the following prompt to generate your agent description:

    Prompt for generating agent
    You are GitHub-Search-Agent, an expert at searching and analyzing GitHub repositories, code, users, and topics. You have deep knowledge of GitHub's API and can provide concise, accurate information about code patterns, repository structures, and developer activities.
    
  4. Now define your agent’s role, goal, and instructions:

    Role
    You are GitHub-Search-Agent, an expert at searching and analyzing GitHub repositories, code, users, and topics. You have deep knowledge of GitHub's API and can provide concise, accurate information about code patterns, repository structures, and developer activities.
    
    Goal
    Your goal is to help users find relevant code, repositories, and developers on GitHub to accelerate their development process. You should prioritize providing the most relevant results and explaining code patterns when found.
    
    Instructions
    1. When searching for code, focus on finding the most relevant implementations
    2. For repositories, highlight the most popular ones with good documentation
    3. When exploring user profiles, focus on their recent contributions and expertise
    4. Prefer repositories with higher star counts and recent activity
    5. For any search query, first try to understand what the user is looking for before running the search
    6. Always provide contextual information about why a result is relevant
    
  5. Click Save Changes and close the panel

Step 3: Add GitHub Search Tools

  1. Click on the + (plus) button in your agent canvas

  2. Select Apps from the menu

  3. Select GitHub Search Manager

  4. Click Sign in with GitHub Search Manager

  5. Give the interface a name (like “github-search-your-name” or just “github-search”)

  6. Click Save

You should now see the GitHub Search tools available as Agentic Actions in your agent’s canvas.

Step 4: Add GitHub Operations

Add the following GitHub operations to your agent:

  • Find Code Snippets by Query Terms - GET /search/code
  • Find Repositories by Criteria - GET /search/repositories
  • Search Topics by Criteria - GET /search/topics
  • Find Users by Criteria - GET /search/users
  • Search Commits by Criteria - GET /search/commits

Your completed agent should look similar to this:

🧪 Testing Your Agent with Raw API Responses

Let’s test your newly created GitHub search agent:

  1. Click on the Tester tab in the xpander.ai interface

  2. Type the following query to search for topics:

    Prompt
    Search for topics in GitHub related to A2A and MCP
    
  3. Look at the generated payload by clicking on it. You’ll see something like:

    Example AI Generated Payload
    {
       "bodyParams": {},
       "queryParams": {
          "q": "A2A MCP",
          "per-page": 10,
          "page": 1
       },
       "pathParams": {}
    }
    
    Example response
    {
     "total_count": 2,
     "incomplete_results": false,
     "items": [
         {
             "name": "a2a-mcp",
             "created_at": "2025-04-30T00:52:27Z",
             "updated_at": "2025-04-30T00:52:27Z",
             "featured": false,
             "curated": false,
             "score": 1
         },
         {
             "name": "a2a-vs-mcp",
             "created_at": "2025-04-30T00:52:27Z",
             "updated_at": "2025-04-30T00:52:27Z",
             "featured": false,
             "curated": false,
             "score": 1
         }
     ]
    }
    
  4. Now try a repositories search:

    Prompt
    Find cool repositories related to Agent 2 Agent and MCP
    
    AI Generated payload
    {
       "bodyParams": {},
       "queryParams": {
          "q": "Agent 2 Agent MCP",
          "sort": "stars",
          "order": "desc",
          "per-page": 5,
          "page": 1,
          "xp_rag_query": "Agent 2 Agent MCP",
          "xp_rag_top_k": 5
       },
       "pathParams": {}
    }
    

Notice that the API response is huge and not optimal for AI consumption! This is where Agentic RAG comes in to help filter and optimize these responses.

🧠 Optimizing Responses with Agentic RAG

Let’s improve the repository search results using Agentic RAG:

  1. Return to the Builder tab

  2. Click on the gear icon next to the Find Repositories by Criteria operation

  3. Click on Advanced and then switch to Raw Editor (this allows the AI to send natural language queries, and the server will perform semantic search on the API response)

  4. Configure the search and return fields:

    Searchable Fields (fields the AI can search against):

    Searchable Fields Configuration
    [
      "items[].full_name",
      "items[].description",
      "items[].topics"
    ]
    

    Returnable Fields (data the AI will receive):

    Returnable Fields Configuration
    [
      "items[].html_url",
      "items[].full_name",
      "items[].description"
    ]
    
  5. Save your changes

  6. Return to the Tester tab and run the repositories query again:

    Prompt
    Find cool repositories related to Agent 2 Agent and MCP
    

Notice how much cleaner and more focused the response is now! Instead of overwhelming JSON data, you get just the meaningful content that matters to your query.

Next, let’s optimize the user search operation:

  1. Return to the Builder tab

  2. Click on the gear icon next to the Find Users by Criteria operation

  3. Go to AdvancedRaw Editor

  4. Configure the fields:

    Searchable Fields:

    User Search Configuration
    [
      "items[].login"
    ]
    

    Returnable Fields:

    User Return Configuration
    [
      "items[].html_url",
      "items[].login", 
      "items[].avatar_url"
    ]
    
  5. Save your changes

  6. In the Tester tab, run:

    Prompt
    Find GitHub users related to MCP and Agent 2 Agent protocol
    

Optimizing Commit Searches

Now, let’s configure commit search operations:

  1. Return to the Builder tab

  2. Click on the gear icon next to the Search Commits by Criteria operation

  3. Go to AdvancedRaw Editor

  4. Configure the fields:

    Searchable Fields:

    Commit Search Configuration
    [
      "items[].commit.message",
      "items[].author.login",
      "items[].commit.author.name",
      "items[].repository.full_name"
    ]
    

    Returnable Fields:

    Commit Return Configuration
    [
      "items[].html_url",
      "items[].sha",
      "items[].commit.message",
      "items[].commit.author.date",
      "items[].author.login", 
      "items[].repository.html_url"
    ]
    
  5. Save your changes

  6. In the Tester tab, try a follow-up question:

    Follow-up Prompt
    Summarize their recent commits
    

Finally, let’s optimize the code snippet operation:

  1. Return to the Builder tab

  2. Click on the gear icon next to the Find Code Snippets by Query Terms operation

  3. Go to AdvancedRaw Editor

  4. Configure the fields:

    Searchable Fields:

    Searchable Fields Configuration
    [
      "items[].name",
      "items[].path",
      "items[].repository.full_name",
      "items[].repository.description"
    ]
    

    Returnable Fields:

    Returnable Fields Configuration
    [
      "items[].html_url",
      "items[].path",
      "items[].repository.html_url",
      "items[].repository.full_name"
    ]
    
  5. Save your changes

  6. In the Tester tab, run:

    Prompt
    Can you find the implementation of A2A code snippet?
    

The optimized results should look similar to this:

Example Output
Here are some code snippets related to "A2A" found on GitHub:

1. **Repository:** [hyunjun/bookmarks](https://github.com/hyunjun/bookmarks)
   - **File:** [c.md](https://github.com/hyunjun/bookmarks/blob/b0dce61583620ed1bff22408dfae28563e3cf28f/c.md)

2. **Repository:** [mwherman2000/did-uri-spec](https://github.com/mwherman2000/did-uri-spec)
   - **File:** [FAQ.md](https://github.com/mwherman2000/did-uri-spec/blob/deff7e273046574ec3703313b1bb833a4d060cbc/FAQ.md)

3. **Repository:** [zeromake/library](https://github.com/zeromake/library)
   - **File:** [TOC.md](https://github.com/zeromake/library/blob/6073134ad593c54897efcb3339f4a0662bc7a7b1/TOC.md)

4. **Repository:** [tansuotv/IPTVindex](https://github.com/tansuotv/IPTVindex)
   - **File:** [tv.txt](https://github.com/tansuotv/IPTVindex/blob/869d6d43ae2316cc4b0628affc9b87650a783c5a/tv.txt)

5. **Repository:** [kitnil/notes](https://github.com/kitnil/notes)
   - **File:** [s3.org](https://github.com/kitnil/notes/blob/da3da9c30cbebee0075fb92f58f8fc3dac7e7e0f/s3.org)

📄 Adding Code Reading Capabilities

Let’s enhance our agent by adding the ability to read code directly from GitHub URLs:

  1. In the left navigation menu, go to Cloud Functions
  2. Click the New button
  3. Add the following code:
GitHub Code Reader Function
# === GitHub Code Reader ===
# Contract-name: xpander_run_action
# Purpose: Retrieve any file's raw text from GitHub (public or private) so downstream
#          agents can parse the code.

# import requests – auto-imported by the runtime
import os

def xpander_run_action(repo_full_name: str, file_path: str, ref: str = "main"):
    # GitHub Code Reader
    # ------------------
    # Args:
    #     repo_full_name : str  | "owner/repo", e.g. "org/project"
    #     file_path      : str  | path within the repo, e.g. "src/app.py"
    #     ref            : str  | branch, tag, or commit SHA (default "main")
    # Optional ENV:
    #     GIT_TOKEN            | Personal Access Token for private repositories.
    #                           If present, it's sent as a Bearer token.
    # Returns:
    #     str – the file's text on success, or an error string on failure.

    raw_url = f"https://raw.githubusercontent.com/{repo_full_name}/{ref}/{file_path}"

    headers = {}
    token = os.getenv("GIT_TOKEN")
    if token:
        headers["Authorization"] = f"Bearer {token}"

    try:
        resp = requests.get(raw_url, headers=headers, timeout=15)
        resp.raise_for_status()
    except requests.RequestException as exc:
        return f"Request failed: {exc}"

    return resp.text
  1. Name it GitHub Code Reader and click Save
  2. Return to your agent in the AI Agents section
  3. Click the + button
  4. Select Custom Action
  5. Add the GitHub Code Reader function to your agent canvas
  6. Click Save -> Deploy.

Run the following prompt

Example Prompt
How can I increase throughput using cross-region inference with Amazon Bedrock and boto3? Download and provide the real python code snippet from AWS repos, not a link

The screenshot is from the agent’s WebUI. You can access yours by clicking the ‘Chat’ button on the canvas — the WebUI URL will appear

Your GitHub search agent currently can only access information available on GitHub. Let’s add web search capabilities to make it more versatile:

  1. Return to your agent canvas

  2. Click the + button

  3. Select Built-in Actions

  4. Add Fetch Tavily AI Insights (for comprehensive web searching)

  5. Click on the gear icon in your agent canvas to edit the agent settings

  6. Update your agent’s instructions by adding these important rules:

    Updated Instructions
    7. ALWAYS start with Tavily AI Insights to gather general information, but NEVER stop there
    8. After using Tavily, ALWAYS follow up with relevant GitHub searches to find concrete code examples and repositories
    9. For every user query, you must provide both web-based context AND GitHub-specific examples
    
  7. Save your changes

  8. In the Tester tab, try a query that requires recent information:

    Test Prompt
    What are the latest changes to the A2A protocol specification? Show me example implementations.
    

Your agent should now provide more comprehensive answers by combining web search results with GitHub code examples:

Sometimes, our autonomous agent might choose to use GitHub before Tavily, despite our explicit instructions. This happens because the AI model sees all available tools at once and makes decisions based on its understanding of the prompt and tool descriptions.

This reveals a fascinating design challenge in AI systems: How can we create truly autonomous agents capable of making intelligent decisions while ensuring they reliably follow critical business logic when needed? 🤔

🔄 Creating the Agent Dependency Graph

In production, to ensure your agent follows the right search strategy, we should always create a dependency graph that deterministically enforces specific business logic behavior:

  1. In your agent canvas, click on the Graph View button
  2. Set up a workflow that requires an internet search before accessing GitHub operations:
    • Connect the Tavily AI Insights node as a prerequisite to the GitHub operations
    • Ensure the flow follows a logical progression
  1. Repeat the query with explict request to not start with Tavily
test prompt
What are the latest changes to the A2A protocol specification? Show me example implementations. Use GitHub without Tavily
  1. Validate additonal queries:
Test Prompt
Using only Google's A2A protocol implementation, create an agent that responds with 'hello world'. Provide the Python code.
  1. Verify that the AI follows the correct workflow - first searching for information before accessing GitHub operations:

Congratulations! You’ve built a reliable AI Agent that balances autonomy with guardrails. By creating this structured workflow, you’ve effectively addressed both the API overflow challenge and solved the autonomous multi-step challenges

🌐 Exposing Your Agent as an MCP

The final step is to expose your optimized GitHub agent as a Model Context Protocol (MCP) server:

  1. In your agent canvas, click the + button
  2. Select SourceMCP
  3. Click Deploy
  1. After deployment, you’ll receive a unique URL for your MCP server

  2. Configure this endpoint in your Cursor IDE by editing the settings file:

    Create or edit ~/.cursor/mcp.json:

    mcp.json
    {
      "mcpServers":{
        "xpander-github-mcp": {
          "command": "npx",
          "args": [
            "xpander-mcp-remote",
            "https://mcp.xpander.ai/your-server-url/"
          ]
        }
      }
    }
    

    Make sure to replace https://mcp.xpander.ai/your-server-url/ with your actual MCP server URL from the deployment step!

  3. Restart Cursor IDE to apply the changes

  4. Test your MCP integration by asking a question in Cursor:

    Prompt to Agent (via MCP)
    search for code samples of xpander-ai sdk
    

✅ Checkpoint

Congratulations! By completing this module, you should now be able to:

  1. Create and configure a GitHub search agent with optimized Agentic RAG
  2. Customize searchable and returnable fields for each API operation
  3. Add custom code reading capabilities to your agent
  4. Structure agent workflows using dependency graphs
  5. Deploy your agent as an MCP endpoint for Cursor IDE

🔄 Next Steps

Now that you’ve supercharged your IDE with an optimized GitHub MCP Server, you’re ready to build your first coding agent in the next module.