I’m a big fan of Google’s NotebookLM which can transform documents into podcasts. I’ve used it to distill the latest AI and Machine Learning research papers into audio summaries that I can listen to while walking or working out at the gym.
These tools have a wide range of potential uses. For instance:
Students can turn their study notes into audio summaries.
Small businesses could produce podcasts and sound bites to showcase their products.
Teachers or educators could create audio lessons or summaries for their students, making learning more accessible and engaging.
Inspired by these possibilities, I put together code to programmatically generate an AI podcast generator in Python using AI Agents and several generative AI and large language model tools. This step-by-step guide will show you how to create your own custom AI-generated podcasts. Full code is shared below the video.
AI Podcast Generator Code Breakdown
0. Setup Python Environment and Install Libraries
## Example Environment Creation Using Conda:
conda create --name podcast_env python=3.12.8 -y
conda activate podcast_env
## Required Libraries
pip install llama-index langchain elevenlabs pydub ipykernel
1. Libraries and Functions
To start, import the required libraries for embeddings, RAG, text-to-speech, and audio stitching.
# Libraries and Functions
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.tools import QueryEngineTool
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
from langchain.prompts import PromptTemplate
from elevenlabs import ElevenLabs, VoiceSettings
from pydub import AudioSegment
2. API Keys and Configuration
Save API keys and configure podcast settings like duration, PDF file path, personas, and voice IDs.
# API Keys
ELEVEN_LABS_API_KEY = 'API Key Goes Here'
OPENAI_API_KEY = 'API Key Goes Here'
# Configuration inputs
podcast_length = 3 # in minutes
uploaded_files = ['/path/to/document.pdf']
persona_1 = "Bob, a journalist" # Go into as much detail as you want
persona_2 = "Tricia, a science journalist" # Go into as much detail as you want
voice_1 = "Voice_ID_1" # Replace with desired Eleven Labs voice ID
voice_2 = "Voice_ID_2" # Replace with desired Eleven Labs voice ID
3. PDF Processing and Embeddings
Load documents, create embeddings, and build a vector store index.
# Load documents and create index
documents = []
for file_path in uploaded_files:
docs = SimpleDirectoryReader(input_files=[file_path]).load_data()
documents.extend(docs)
# Set up OpenAI embedding model
embed_model = OpenAIEmbedding(model='text-embedding-3-small', api_key=OPENAI_API_KEY)
Settings.embed_model = embed_model
# Create the vector store index
index = VectorStoreIndex.from_documents(documents)
4. Agentic Retrieval-Augmented Generation (RAG)
Initialize an agent to retrieve document context and generate a podcast script.
# Initialize LLM
llm = OpenAI(api_key=OPENAI_API_KEY, model='gpt-4o-mini')
Settings.llm = llm
# Define query engine
query_engine = index.as_query_engine()
query_tool = QueryEngineTool.from_defaults(query_engine=query_engine, name="document_retrieval_tool",
description="Tool to retrieve relevant information from document context.")
# Create custom script prompt
average_words_per_minute = 150
max_words = podcast_length * average_words_per_minute
script_generation_prompt = PromptTemplate(
template="""
You are an expert podcaster and writer. Given the context about the podcast, create a natural, fluid conversation between the two people acting as interviewers in the format of:
Interviewer 1: [insert the question or statement made by the interviewer]
Interviewer 2: [insert the question or statement made by the interviewer]
...
This format must be used directly. Do not put the Interviewers names or personas. Just "Interviewer 1:" or "Interviewer 2:"
The first interviewer will be named: Interviewer 1 and will have the persona: {persona_1}.
The second interviewer will be named: Interviewer 2 and will have the persona: {persona_2}.
IMPORTANT: The total length of the podcast should be around: {podcast_length} minutes.
Ensure the script does not exceed {max_words} words.
Allocate time proportionally to each topic, keeping within the podcast's total length.
"""
)
# Agent initialization
agent = OpenAIAgent.from_tools(
tools=[query_tool],
llm=llm,
verbose=True,
system_prompt="""
You are an expert podcast script writer.
Your task is to create a detailed and engaging podcast script based on information found in a document.
You will follow these steps:
1. Use the 'document_retrieval_tool' to identify the key topics that should be discussed during the podcast.
2. Use the 'document_retrieval_tool' to retrieve specific details and information about the identified key topics.
3. Generate an outline of the podcast, with a specific time allocation for each topic. The sum of all topic durations must not exceed {podcast_length} minutes. Ensure each section is proportional to the overall length of the podcast.
4. Develop the script using this outline and retrieved information, and ensure the script is conversational between the two interviewers, using the persona instructions.
5. IMPORTANT: Ensure the script is suitable for a podcast of {podcast_length} minutes and does not exceed {max_words} words.
"""
)
# Format the prompt for the agent
formatted_prompt = script_generation_prompt.format(
persona_1=persona_1,
persona_2=persona_1,
podcast_length=podcast_length,
max_words=max_words
)
response = agent.chat(formatted_prompt)
script = response.response
print(script)
5. Text-to-Speech Conversion
Generate audio for each segment of the script using Eleven Labs.
# Set Eleven Labs API Key
client = ElevenLabs(api_key=ELEVEN_LABS_API_KEY)
# Voice settings
voice_settings = VoiceSettings(stability=0.3, use_speaker_boost=True)
# Function to generate audio from text
def generate_audio(text, voice_id):
audio = client.generate(text=text, voice=voice_id, model="eleven_multilingual_v2", output_format="mp3_44100_192")
return audio
6. Splitting the Script
Divide the script into segments by speaker.
def split_script(script):
segments = []
current_speaker = None
current_segment = ""
for line in script.split("\n"):
line = line.strip()
if line.startswith("Interviewer 1:"):
if current_speaker == "Interviewer 2" and current_segment:
segments.append((current_speaker, current_segment.strip()))
current_segment = ""
current_speaker = "Interviewer 1"
current_segment = line[len("Interviewer 1:"):].strip()
elif line.startswith("Interviewer 2:"):
if current_speaker == "Interviewer 1" and current_segment:
segments.append((current_speaker, current_segment.strip()))
current_segment = ""
current_speaker = "Interviewer 2"
current_segment = line[len("Interviewer 2:"):].strip()
elif line:
current_segment += " " + line
if current_segment:
segments.append((current_speaker, current_segment.strip()))
return segments
# Split script
script_segments = split_script(script)
7. Generating Audio Segments
Save individual audio clips for each speaker’s lines.
audio_files = []
for i, (speaker, segment) in enumerate(script_segments):
if speaker == "Interviewer 1":
audio_output = generate_audio(segment, voice_1)
filename = f"segment_{i}_1.mp3"
elif speaker == "Interviewer 2":
audio_output = generate_audio(segment, voice_2)
filename = f"segment_{i}_2.mp3"
else:
continue
with open(filename, "wb") as f:
for chunk in audio_output:
f.write(chunk)
audio_files.append(filename)
8. Stitching Audio Files
Combine all audio segments into one MP3 file.
# Combine audio segments
combined_audio = AudioSegment.empty()
for filename in audio_files:
audio_segment = AudioSegment.from_mp3(filename)
combined_audio += audio_segment
# Save final podcast
combined_audio.export("full_podcast.mp3", format="mp3")
print("Podcast saved as full_podcast.mp3")
Subscribe to the Deep Charts YouTube Channel for more informative AI and Machine Learning Tutorials.