Skip to content
Get started

Searching for Users and Repositories

Learn how to use semantic search for repositories and full-text search for users with powerful filtering.

Bounty Lab provides two powerful search capabilities: semantic search for repositories and full-text search for users.

Bounty Lab uses natural language queries combined with structured filters:

  • Repository Search: Semantic/vector search using embeddings - understands meaning and context
  • User Search: BM25 full-text search - fast keyword matching with relevance ranking
  • Filters: Structured filter objects for precise filtering (not query strings)
  • Results: Up to 1,000 results per search (default 100)

Repository search uses semantic search with vector embeddings to find repositories based on meaning and context, not just keywords.

const response = await client.searchRepos.search({
query: "react component library with typescript",
});
// Returns repositories semantically similar to your query
console.log(response.repositories);
// - count: number of results
// - repositories: array of results with relevance scores

The semantic search understands intent and context:

// These queries find different but semantically related results:
await client.searchRepos.search({
query: "machine learning for image classification",
});
await client.searchRepos.search({
query: "computer vision neural networks",
});

Combine semantic search with structured filters to narrow results:

// Find TypeScript UI libraries with active communities
const response = await client.searchRepos.search({
query: "component library design system",
filters: {
op: "And",
filters: [
{ field: "language", op: "Eq", value: "TypeScript" },
{ field: "stargazerCount", op: "Gte", value: 1000 },
],
},
maxResults: 50,
});

Filters use {field, op, value} pattern. Key operators:

Important: Use FTS for Location and Free-Text Fields

For fields like locations, company names, bios, and descriptions, always use ContainsAllTokens instead of Eq or In. These fields often have variations in spelling, formatting, and case (e.g., “San Francisco, CA” vs “san francisco” vs “SF”). Full-text search handles all variations automatically.

String fields (language, name, etc.):

{ field: 'language', op: 'Eq', value: 'Python' }
{ field: 'language', op: 'In', value: ['Python', 'JavaScript', 'Go'] }
{ field: 'language', op: 'NotIn', value: ['HTML', 'CSS'] }

Number fields (stars, issues):

{ field: 'stargazerCount', op: 'Gte', value: 1000 } // >= 1000 stars
{ field: 'stargazerCount', op: 'Lte', value: 50000 } // <= 50000 stars
{ field: 'stargazerCount', op: 'Gt', value: 999 } // > 999 stars (exclusive)
{ field: 'stargazerCount', op: 'Lt', value: 10000 } // < 10000 stars (exclusive)

Full-text search (description, readme, locations, bio, emails, resolved locations):

// Use ContainsAllTokens - handles formatting variations automatically
{ field: 'lastContributorLocations', op: 'ContainsAllTokens', value: 'San Francisco' }
{ field: 'readmePreview', op: 'ContainsAllTokens', value: 'react typescript' }
{ field: 'bio', op: 'ContainsAllTokens', value: 'kubernetes cloud' }
{ field: 'resolvedCity', op: 'ContainsAllTokens', value: 'San Francisco' }
{ field: 'resolvedCountry', op: 'ContainsAllTokens', value: 'United States' }

Combining filters:

// AND - all conditions must match
{
op: 'And',
filters: [
{ field: 'language', op: 'Eq', value: 'Python' },
{ field: 'stargazerCount', op: 'Gte', value: 1000 }
]
}
// OR - any condition can match (use FTS for locations)
{
op: 'Or',
filters: [
{ field: 'resolvedCity', op: 'ContainsAllTokens', value: 'San Francisco' },
{ field: 'resolvedCity', op: 'ContainsAllTokens', value: 'Seattle' }
]
}

Repositories can be filtered by:

  • githubId - Node ID (string)
  • ownerLogin - Repository owner username (string)
  • name - Repository name (string)
  • stargazerCount - Star count (number, supports Gte/Lte)
  • language - Primary programming language (string)
  • totalIssuesCount - Total issues (number, supports Gte/Lte)
  • totalIssuesOpen - Open issues (number, supports Gte/Lte)
  • totalIssuesClosed - Closed issues (number, supports Gte/Lte)
  • lastContributorLocations - Contributor locations (string)

For complete field documentation, see the Repository Fields Reference.

// Get more results (max 1000)
const response = await client.searchRepos.search({
query: "data visualization",
maxResults: 500,
});
// Default is 100 results
const response2 = await client.searchRepos.search({
query: "data visualization",
});

User search uses BM25 full-text search - a keyword-based search algorithm optimized for finding relevant matches across text fields.

const response = await client.searchUsers.search({
query: "machine learning engineer san francisco",
});
// Searches across: emails (3x weight), login (2x weight), displayName, bio, company, location
// Email addresses are weighted highest for precision
console.log(response.users);

BM25 ranks results by keyword relevance, not semantic meaning:

// Finds users with these keywords in their profile
await client.searchUsers.search({
query: "rust compiler developer",
});
// Better: combine search with filters for precision
await client.searchUsers.search({
query: "rust developer",
filters: {
field: "resolvedCountry",
op: "ContainsAllTokens",
value: "United States",
},
});
// Find developers in specific locations
const response = await client.searchUsers.search({
query: "senior engineer",
filters: {
op: "And",
filters: [
{
field: "resolvedCity",
op: "ContainsAllTokens",
value: "San Francisco",
},
{ field: "company", op: "In", value: ["Google", "Meta", "Apple"] },
],
},
maxResults: 100,
});

Users can be filtered by:

  • githubId - Node ID (string)
  • login - Username (string)
  • company - Company name (string)
  • location - User-provided location (string)
  • emails - Email addresses (string)
  • resolvedCountry - Resolved country from location (string)
  • resolvedState - Resolved state/region (string)
  • resolvedCity - Resolved city (string)

For complete field documentation, see the User Fields Reference.

See the list of languages to filter with here.

Repository search understands context and meaning:

// Good - semantic search finds conceptually related repos
await client.searchRepos.search({
query: "lightweight frontend framework for single page applications",
});
// The above finds Vue, Svelte, etc. even without exact keyword matches

User search is keyword-based, not semantic:

// Good - specific keywords from likely profile fields
await client.searchUsers.search({
query: "typescript react developer",
});
// Less effective - too abstract for keyword search
await client.searchUsers.search({
query: "experienced frontend engineer with modern stack expertise",
});

Always prefer filters over trying to encode filtering in your query:

// Bad - trying to filter via search query
await client.searchRepos.search({
query: "python machine learning 1000+ stars",
});
// Good - use structured filters
await client.searchRepos.search({
query: "machine learning",
filters: {
op: "And",
filters: [
{ field: "language", op: "Eq", value: "Python" },
{ field: "stargazerCount", op: "Gte", value: 1000 },
],
},
});

Begin with a simple query to understand results, then add filters:

// Step 1: Broad search to see what's available
const broad = await client.searchRepos.search({
query: "api client library",
});
// Step 2: Add filters based on what you learned
const narrow = await client.searchRepos.search({
query: "api client library",
filters: {
op: "And",
filters: [
{ field: "language", op: "In", value: ["TypeScript", "JavaScript"] },
{ field: "stargazerCount", op: "Gte", value: 100 },
],
},
});

Both search types return relevance scores:

const response = await client.searchRepos.search({
query: "database orm",
});
response.repositories.forEach((repo) => {
// Lower scores = more relevant for cosine distance
console.log(`${repo.name}: ${repo.score}`);
});
Section titled “Find Popular Projects in Specific Language”
const response = await client.searchRepos.search({
query: "web framework",
filters: {
op: "And",
filters: [
{ field: "language", op: "Eq", value: "Go" },
{ field: "stargazerCount", op: "Gte", value: 1000 },
],
},
maxResults: 50,
});

Find High-Quality Repos by Contributor Location

Section titled “Find High-Quality Repos by Contributor Location”

Use ContainsAllTokens on locations - handles all formatting variations automatically:

// Quality Python ML repos with San Francisco contributors
// (handles "San Francisco, CA" / "san francisco" / "SF, California" etc.)
const response = await client.searchRepos.search({
query: "machine learning",
filters: {
op: "And",
filters: [
{
field: "lastContributorLocations",
op: "ContainsAllTokens",
value: "San Francisco",
},
{ field: "language", op: "Eq", value: "Python" },
{ field: "stargazerCount", op: "Gte", value: 1000 },
],
},
maxResults: 50,
});

Use ContainsAllTokens on readmePreview or description to find repos by tech stack:

// Find React component libraries (checks README, not just name)
const response = await client.searchRepos.search({
query: "ui component library",
filters: {
op: "And",
filters: [
{
field: "readmePreview",
op: "ContainsAllTokens",
value: "react typescript",
},
{ field: "stargazerCount", op: "Gte", value: 500 },
],
},
});
// Find Kubernetes-native apps with quality threshold
const response2 = await client.searchRepos.search({
query: "cloud native application",
filters: {
op: "And",
filters: [
{
field: "description",
op: "ContainsAllTokens",
value: "kubernetes helm",
},
{ field: "stargazerCount", op: "Gte", value: 100 },
{ field: "language", op: "In", value: ["Go", "Rust"] },
],
},
});
// Active projects with good community engagement
const response = await client.searchRepos.search({
query: "web framework",
filters: {
op: "And",
filters: [
{ field: "stargazerCount", op: "Gte", value: 1000 },
{ field: "totalIssuesOpen", op: "Gte", value: 10 },
{ field: "totalIssuesClosed", op: "Gte", value: 50 },
{ field: "language", op: "NotIn", value: ["HTML", "CSS"] },
],
},
});
// Rust developers in major tech cities (use FTS for locations)
const response = await client.searchUsers.search({
query: "rust systems programming",
filters: {
op: "Or",
filters: [
{
field: "resolvedCity",
op: "ContainsAllTokens",
value: "San Francisco",
},
{ field: "resolvedCity", op: "ContainsAllTokens", value: "Seattle" },
{ field: "resolvedCity", op: "ContainsAllTokens", value: "New York" },
],
},
maxResults: 100,
});

Use ContainsAllTokens on bio to find specific expertise:

// Kubernetes experts in US tech hubs
const response = await client.searchUsers.search({
query: "infrastructure engineer",
filters: {
op: "And",
filters: [
{
field: "bio",
op: "ContainsAllTokens",
value: "kubernetes cloud native",
},
{
op: "Or",
filters: [
{
field: "resolvedCity",
op: "ContainsAllTokens",
value: "San Francisco",
},
{ field: "resolvedCity", op: "ContainsAllTokens", value: "Seattle" },
{ field: "resolvedCity", op: "ContainsAllTokens", value: "Austin" },
],
},
],
},
});
// ML engineers at AI companies (using email domains)
const response = await client.searchUsers.search({
query: "machine learning",
filters: {
op: "Or",
filters: [
{ field: "emails", op: "ContainsAllTokens", value: "@openai.com" },
{ field: "emails", op: "ContainsAllTokens", value: "@anthropic.com" },
{ field: "company", op: "Eq", value: "Google DeepMind" },
],
},
});
  • No pagination: Use maxResults to control result size (max 1,000)
  • No sorting: Results are automatically sorted by relevance score
  • No forceRefresh: Search uses indexed data; use crawl endpoints to update
  • No range queries in query string: Use structured filters with Gte/Lte operators
  • User search is not semantic: It’s keyword-based BM25, not embedding-based