Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use the ArXiv API instead of perplexity? #23

Open
LuOsorio opened this issue Feb 24, 2025 · 1 comment
Open

Is it possible to use the ArXiv API instead of perplexity? #23

LuOsorio opened this issue Feb 24, 2025 · 1 comment

Comments

@LuOsorio
Copy link

In the "search_api" configuration is it possible to use ArXiv to retrieve scientific papers and use that information instead of a regular web search?

@bartolli
Copy link

I've added EXA as a search API, and arXiv and PubMed as separate tools. It's super easy to integrate.

Here's the arXiv tool documentation:
https://python.langchain.com/docs/integrations/tools/arxiv/

Make sure that the response is formatted in the structure expected by deduplicate_and_format_sources.

Below is an example from my implementation:

configuration.py

...
class SearchAPI(Enum):
    PERPLEXITY = "perplexity"
    TAVILY = "tavily"
    EXA = "exa"
...

graph.py

This logic appears in multiple places:

...
    # Search the web
    if search_api == "tavily":
        search_results = await tavily_search_async(query_list)
        source_str = deduplicate_and_format_sources(search_results, max_tokens_per_source=1000, include_raw_content=False)
    elif search_api == "perplexity":
        search_results = perplexity_search(query_list)
        source_str = deduplicate_and_format_sources(search_results, max_tokens_per_source=1000, include_raw_content=False)
    elif search_api == "exa":
        search_results = await exa_search(query_list)
        source_str = deduplicate_and_format_sources(search_results, max_tokens_per_source=1000, include_raw_content=False)

In your arxiv_search method, ensure that the returned structure matches the format expected by deduplicate_and_format_sources:

"""
...
    Args:
        search_queries (List[SearchQuery]): List of search queries to process

    Returns:
        List[dict]: List of search responses from the Perplexity API, one per query. Each response should have the format:
            {
                'query': str,                    # The original search query
                'follow_up_questions': None,      
                'answer': None,
                'images': list,
                'results': [                     # List of search results
                    {
                        'title': str,            # Title of the search result
                        'url': str,              # URL of the result
                        'content': str,          # Summary/snippet of the content
                        'score': float,          # Relevance score
                        'raw_content': str|None  # Full content or None for secondary citations
                    },
                    ...
                ]
            }
...
"""

# Your search logic

Let me know if you need any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants