arxiv_audio_summary/README.md
Regis David Souza Mesquita 1f5b333dc6 Initial commit
2025-02-24 10:06:44 +00:00

5.1 KiB

vibe: Article Summarization & TTS Pipeline

vibe is a Python-based pipeline that automatically fetches the latest Computer Science research articles from arXiv, filters them for relevance using a language model (LLM), converts article PDFs to Markdown with Docling, generates narrative summaries, and synthesizes the summaries into an MP3 audio file using a text-to-speech (TTS) system. This tool is ideal for users who prefer listening to curated research summaries on the go or integrating the process into a larger system via an API.

Features

  • Fetch Articles: Retrieves the latest Computer Science articles from arXiv.
  • Cache Mechanism: Caches article metadata and converted content to speed up subsequent requests.
  • Relevance Filtering: Uses an LLM to filter articles based on user-provided interests.
  • PDF Conversion: Converts PDF articles to Markdown format using Docling.
  • Summarization: Generates a fluid, narrative-style summary for each relevant article with the help of an LLM.
  • Text-to-Speech: Converts the final narrative summary into an MP3 file using KPipeline.
  • Flask API: Exposes the functionality via a RESTful endpoint for dynamic requests.
  • CLI and Server Modes: Run the pipeline as a one-off CLI command or as a continuously running Flask server.

Why Use vibe?

  • Stay Updated: Automatically curate and summarize the latest research articles so you can keep up with advancements in your field.
  • Hands-Free Listening: Enjoy audio summaries during your commute or while multitasking.
  • Automated Workflow: Seamlessly integrate multiple processing steps—from fetching and filtering to summarization and TTS.
  • Flexible Deployment: Use the CLI mode for quick summaries or deploy the Flask API for integration with other systems.

Installation

  1. Prerequisites: Ensure you have Python 3.x installed on your system.

  2. Clone the Repository: Clone this repository to your local machine.

  3. Install Dependencies: Navigate to the project directory and install the required packages:

    pip install -r requirements.txt
    

Usage

CLI Mode

Run the pipeline once to generate an MP3 summary file. For example:

python vibe.py --generate --prompt "I live in a mid-sized European city, working in the tech industry on AI-driven automation solutions. I prefer content focused on deep learning and reinforcement learning applications, and I want to filter out less relevant topics. Only include articles that are rated 9 or 10 on a relevance scale from 0 to 10." --max-articles 10 --output summary_cli.mp3

This command fetches the latest articles from arXiv, filters and ranks them based on your specified interests, generates narrative summaries, and converts the final summary into an MP3 file named summary_cli.mp3.

Server Mode

Alternatively, you can run vibe as a Flask server:

python vibe.py --serve

Once the server is running, you can process requests by sending a POST request to the /process endpoint. For example:

curl -X POST http://127.0.0.1:5000/process \
     -H "Content-Type: application/json" \
     -d '{"user_info": "Your interests here", "max_articles": 5, "new_only": false}'

The server processes the articles, generates an MP3 summary, and returns the file as a downloadable response.

Environment Variables

The following environment variables can be set to customize the behavior of vibe:

  • ARXIV_URL: The URL used to fetch the latest arXiv articles. Defaults to https://arxiv.org/list/cs/new.
  • LLM_URL: The URL for the language model endpoint. Defaults to http://127.0.0.1:4000/v1/chat/completions (this is a litellm instance).
  • MODEL_NAME: The model name to be used by the LLM. Defaults to mistral-small-latest.

Note that using the mistral-small model through their cloud service typically costs a few cents per run and completes the summarization process in around 4 minutes. It is also possible to run vibe with local LLMs (such as qwen 2.5 14b or mistral-small), although these local runs may take up to an hour.

Project Structure

  • vibe.py: Main application file containing modules for:
    • Fetching and caching arXiv articles.
    • Filtering articles for relevance.
    • Converting PDFs to Markdown using Docling.
    • Summarizing articles via an LLM.
    • Converting text summaries to speech (MP3) using KPipeline.
    • Exposing a Flask API for processing requests.
  • requirements.txt: Contains the list of Python packages required by the project.
  • CACHE_DIR: Directory created at runtime for caching articles and processed files.

Dependencies

The project relies on several key libraries:

  • Flask
  • requests
  • beautifulsoup4
  • soundfile
  • docling
  • kokoro

Contributing

Contributions are welcome! Feel free to fork this repository and submit pull requests with improvements or bug fixes.

License

This project is licensed under the MIT License.

Acknowledgments

Thanks to the developers of Docling and Kokoro as well as the maintainers of BeautifulSoup and Flask for providing great tools that made this project possible.