arxiv_audio_summary/README.md

# vibe: Article Summarization & TTS Pipeline

vibe is a Python-based pipeline that automatically fetches the latest Computer Science research articles from arXiv, filters them for relevance using a language model (LLM), converts article PDFs to Markdown with Docling, generates narrative summaries, and synthesizes the summaries into an MP3 audio file using a text-to-speech (TTS) system. This tool is ideal for users who prefer listening to curated research summaries on the go or integrating the process into a larger system via an API.

## Features

- **Fetch Articles:** Retrieves the latest Computer Science articles from arXiv.
- **Cache Mechanism:** Caches article metadata and converted content to speed up subsequent requests.
- **Relevance Filtering:** Uses an LLM to filter articles based on user-provided interests.
- **PDF Conversion:** Converts PDF articles to Markdown format using Docling.
- **Summarization:** Generates a fluid, narrative-style summary for each relevant article with the help of an LLM.
- **Text-to-Speech:** Converts the final narrative summary into an MP3 file using KPipeline.
- **Flask API:** Exposes the functionality via a RESTful endpoint for dynamic requests.
- **CLI and Server Modes:** Run the pipeline as a one-off CLI command or as a continuously running Flask server.

## Why Use vibe?

- **Stay Updated:** Automatically curate and summarize the latest research articles so you can keep up with advancements in your field.
- **Hands-Free Listening:** Enjoy audio summaries during your commute or while multitasking.
- **Automated Workflow:** Seamlessly integrate multiple processing steps—from fetching and filtering to summarization and TTS.
- **Flexible Deployment:** Use the CLI mode for quick summaries or deploy the Flask API for integration with other systems.

## Installation

1. **Prerequisites:**
   Ensure you have Python 3.x installed on your system.

2. **Clone the Repository:**
   Clone this repository to your local machine.

3. **Install Dependencies:**
   Navigate to the project directory and install the required packages:
   ```
   pip install -r requirements.txt
   ```

## Usage

### CLI Mode

Run the pipeline once to generate an MP3 summary file. For example:
```
python vibe.py --generate --prompt "I live in a mid-sized European city, working in the tech industry on AI-driven automation solutions. I prefer content focused on deep learning and reinforcement learning applications, and I want to filter out less relevant topics. Only include articles that are rated 9 or 10 on a relevance scale from 0 to 10." --max-articles 10 --output summary_cli.mp3
```
This command fetches the latest articles from arXiv, filters and ranks them based on your specified interests, generates narrative summaries, and converts the final summary into an MP3 file named `summary_cli.mp3`.

### Server Mode

Alternatively, you can run vibe as a Flask server:
```
python vibe.py --serve
```
Once the server is running, you can process requests by sending a POST request to the `/process` endpoint. For example:
```
curl -X POST http://127.0.0.1:5000/process \
     -H "Content-Type: application/json" \
     -d '{"user_info": "Your interests here", "max_articles": 5, "new_only": false}'
```
The server processes the articles, generates an MP3 summary, and returns the file as a downloadable response.

## Environment Variables

The following environment variables can be set to customize the behavior of vibe:

- `ARXIV_URL`: The URL used to fetch the latest arXiv articles. Defaults to `https://arxiv.org/list/cs/new`.
- `LLM_URL`: The URL for the language model endpoint. Defaults to `http://127.0.0.1:4000/v1/chat/completions` (this is a litellm instance).
- `MODEL_NAME`: The model name to be used by the LLM. Defaults to `mistral-small-latest`.

Note that using the `mistral-small` model through their cloud service typically costs a few cents per run and completes the summarization process in around 4 minutes. It is also possible to run vibe with local LLMs (such as qwen 2.5 14b or mistral-small), although these local runs may take up to an hour.

## Project Structure

- **vibe.py:** Main application file containing modules for:
  - Fetching and caching arXiv articles.
  - Filtering articles for relevance.
  - Converting PDFs to Markdown using Docling.
  - Summarizing articles via an LLM.
  - Converting text summaries to speech (MP3) using KPipeline.
  - Exposing a Flask API for processing requests.
- **requirements.txt:** Contains the list of Python packages required by the project.
- **CACHE_DIR:** Directory created at runtime for caching articles and processed files.

## Dependencies

The project relies on several key libraries:
- Flask
- requests
- beautifulsoup4
- soundfile
- docling
- kokoro

## Contributing

Contributions are welcome! Feel free to fork this repository and submit pull requests with improvements or bug fixes.

## License

This project is licensed under the MIT License.

## Acknowledgments

Thanks to the developers of [Docling](https://github.com/docling) and [Kokoro](https://github.com/kokoro) as well as the maintainers of BeautifulSoup and Flask for providing great tools that made this project possible.