arxiv_audio_summary/README.md

103 lines
5.1 KiB
Markdown
Raw Normal View History

2025-02-24 10:06:44 +00:00
# vibe: Article Summarization & TTS Pipeline
vibe is a Python-based pipeline that automatically fetches the latest Computer Science research articles from arXiv, filters them for relevance using a language model (LLM), converts article PDFs to Markdown with Docling, generates narrative summaries, and synthesizes the summaries into an MP3 audio file using a text-to-speech (TTS) system. This tool is ideal for users who prefer listening to curated research summaries on the go or integrating the process into a larger system via an API.
## Features
- **Fetch Articles:** Retrieves the latest Computer Science articles from arXiv.
- **Cache Mechanism:** Caches article metadata and converted content to speed up subsequent requests.
- **Relevance Filtering:** Uses an LLM to filter articles based on user-provided interests.
- **PDF Conversion:** Converts PDF articles to Markdown format using Docling.
- **Summarization:** Generates a fluid, narrative-style summary for each relevant article with the help of an LLM.
- **Text-to-Speech:** Converts the final narrative summary into an MP3 file using KPipeline.
- **Flask API:** Exposes the functionality via a RESTful endpoint for dynamic requests.
- **CLI and Server Modes:** Run the pipeline as a one-off CLI command or as a continuously running Flask server.
## Why Use vibe?
- **Stay Updated:** Automatically curate and summarize the latest research articles so you can keep up with advancements in your field.
- **Hands-Free Listening:** Enjoy audio summaries during your commute or while multitasking.
- **Automated Workflow:** Seamlessly integrate multiple processing steps—from fetching and filtering to summarization and TTS.
- **Flexible Deployment:** Use the CLI mode for quick summaries or deploy the Flask API for integration with other systems.
## Installation
1. **Prerequisites:**
Ensure you have Python 3.x installed on your system.
2. **Clone the Repository:**
Clone this repository to your local machine.
3. **Install Dependencies:**
Navigate to the project directory and install the required packages:
```
pip install -r requirements.txt
```
## Usage
### CLI Mode
Run the pipeline once to generate an MP3 summary file. For example:
```
python vibe.py --generate --prompt "I live in a mid-sized European city, working in the tech industry on AI-driven automation solutions. I prefer content focused on deep learning and reinforcement learning applications, and I want to filter out less relevant topics. Only include articles that are rated 9 or 10 on a relevance scale from 0 to 10." --max-articles 10 --output summary_cli.mp3
```
This command fetches the latest articles from arXiv, filters and ranks them based on your specified interests, generates narrative summaries, and converts the final summary into an MP3 file named `summary_cli.mp3`.
### Server Mode
Alternatively, you can run vibe as a Flask server:
```
python vibe.py --serve
```
Once the server is running, you can process requests by sending a POST request to the `/process` endpoint. For example:
```
curl -X POST http://127.0.0.1:5000/process \
-H "Content-Type: application/json" \
-d '{"user_info": "Your interests here", "max_articles": 5, "new_only": false}'
```
The server processes the articles, generates an MP3 summary, and returns the file as a downloadable response.
## Environment Variables
The following environment variables can be set to customize the behavior of vibe:
- `ARXIV_URL`: The URL used to fetch the latest arXiv articles. Defaults to `https://arxiv.org/list/cs/new`.
- `LLM_URL`: The URL for the language model endpoint. Defaults to `http://127.0.0.1:4000/v1/chat/completions` (this is a litellm instance).
- `MODEL_NAME`: The model name to be used by the LLM. Defaults to `mistral-small-latest`.
Note that using the `mistral-small` model through their cloud service typically costs a few cents per run and completes the summarization process in around 4 minutes. It is also possible to run vibe with local LLMs (such as qwen 2.5 14b or mistral-small), although these local runs may take up to an hour.
## Project Structure
- **vibe.py:** Main application file containing modules for:
- Fetching and caching arXiv articles.
- Filtering articles for relevance.
- Converting PDFs to Markdown using Docling.
- Summarizing articles via an LLM.
- Converting text summaries to speech (MP3) using KPipeline.
- Exposing a Flask API for processing requests.
- **requirements.txt:** Contains the list of Python packages required by the project.
- **CACHE_DIR:** Directory created at runtime for caching articles and processed files.
## Dependencies
The project relies on several key libraries:
- Flask
- requests
- beautifulsoup4
- soundfile
- docling
- kokoro
## Contributing
Contributions are welcome! Feel free to fork this repository and submit pull requests with improvements or bug fixes.
## License
This project is licensed under the MIT License.
## Acknowledgments
Thanks to the developers of [Docling](https://github.com/docling) and [Kokoro](https://github.com/kokoro) as well as the maintainers of BeautifulSoup and Flask for providing great tools that made this project possible.