AI Agents for Stock Analysis: Using LLM's to Analyze Financial Documents

Leverage AI Agents for Fundamental Analysis of the Stock Market

Jan 23, 2025

This tutorial walks you through extracting meaningful insights from financial documents like company annual reports (Form 10-K) using AI agents with Python, OpenAI API, and Llama Index. By combining Retrieval-Augmented Generation (RAG) with AI Agents, you’ll learn how to automate fundamental analysis.

This code is nowhere near rigorous, but is about the art of the possible -- it is up to you as the financial analyst, machine learning engineer, data scientist, or student to customize a workflow based on your own specific use case and market thesis.

Full code below the video.

Important Note: This video is not financial or investing advice. It is an educational tutorial on how to use AI Agents in Python. Also, don't blindly trust the results of LLM model results without critical thinking or subject matter expertise 🧠. LLM's are still experimental technology that have high error rates.

######################################
#### Fundamental Analysis AI RAG Agent

### Libraries and Functions
# Embeddings
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
# Agentic RAG
from llama_index.core.tools import QueryEngineTool
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI

### API Keys
OPENAI_API_KEY = "OpenAI API Key Goes Here"

### PDF Paths
# Configuration inputs
uploaded_files = ['/Users/DeepCharts/Fundamental Analysis Agent/01262024.pdf']  # Replace with your file paths

### PDF Processing, Embeddings, and Agentic RAG
## Processing and Embedding
# Load documents and create index
documents = []
for file_path in uploaded_files:
    docs = SimpleDirectoryReader(input_files=[file_path]).load_data()
    documents.extend(docs)

# Set up OpenAI embedding model
embed_model = OpenAIEmbedding(model='text-embedding-3-small', api_key=OPENAI_API_KEY) # Changed Embeddings model
Settings.embed_model = embed_model

# Create the vector store index
index = VectorStoreIndex.from_documents(documents)

## Agentic RAG
# Initialize LLM - use OpenAI and GPT-4o-mini
llm = OpenAI(api_key=OPENAI_API_KEY,
             model='gpt-4o-mini')
Settings.llm = llm

# Define Tool
query_engine = index.as_query_engine()
query_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="document_retrieval_tool",
    description="Tool to retrieve relevant information from document context."
)

# Agent initialization
agent = OpenAIAgent.from_tools(
    tools=[query_tool],
    llm=llm,
    verbose=True,
    system_prompt="""
        You are a financial risk analyst. Your task is to analyze 10-K financial documents to identify risks and assign a risk score (1-5, with 5 being the highest risk).
        Steps:
        1. Use the 'document_retrieval_tool' to extract key risk-related information from the 10-K document. This is the only way to access the data.
        2. Summarize the risks and assign a risk score based on the severity and frequency of identified risks.
        3. Provide a detailed explanation for the risk score.
        4. Please format the risk score exactly like "Overall Risk Score:***[score goes here]***"
    """
)

# Run the agent with the prompt
response = agent.chat("""
    You are a financial risk analyst. Your task is to analyze a 10-K financial document to identify risks and assign a risk score (1-5, with 5 being the highest risk).
    Use the 'document_retrieval_tool' to retrieve information. For example, you might query:
    - "What company are in these document for and what is the date of the filing?"
    - "List all risk factors in the document."
    - "Summarize key financial, legal, and compliance risks in the document."
    Summarize the retrieved information and assign a risk score based on the severity and frequency of the risks.
""")

# Get the response
agent_output = response.response
print(agent_output)


### Loop
def process_10k_document(file_path, embed_model_name='text-embedding-3-small', llm_model_name='gpt-4o-mini', api_key=None):
    """
    Processes a 10-K document, analyzes risks, and assigns a risk score.
    
    Args:
        file_path (str): The path to the 10-K document.
        embed_model_name (str): The embedding model name.
        llm_model_name (str): The LLM model name.
        api_key (str): The API key for OpenAI.
        
    Returns:
        str: The analysis and risk score for the document.
    """
    # Load the document
    docs = SimpleDirectoryReader(input_files=[file_path]).load_data()
    
    # Set up OpenAI embedding model
    embed_model = OpenAIEmbedding(model=embed_model_name, api_key=api_key)
    
    # Configure embedding model
    Settings.embed_model = embed_model
    
    # Create the vector store index
    index = VectorStoreIndex.from_documents(docs)
    
    # Initialize LLM
    llm = OpenAI(api_key=api_key, model=llm_model_name)
    Settings.llm = llm
    
    # Define Tool
    query_engine = index.as_query_engine()
    query_tool = QueryEngineTool.from_defaults(
        query_engine=query_engine,
        name="document_retrieval_tool",
        description="Tool to retrieve relevant information from document context."
    )
    
    # Agent initialization
    agent = OpenAIAgent.from_tools(
        tools=[query_tool],
        llm=llm,
        verbose=True,
        system_prompt="""
            You are a financial risk analyst. Your task is to analyze 10-K financial documents to identify risks and assign a risk score (1-5, with 5 being the highest risk. It can be a float.).
            Steps:
            1. Use the 'document_retrieval_tool' to extract key risk-related information from the 10-K document. This is the only way to access the data.
            2. Assign a Risk score between 1 through 5, with 5 being the highest risk. ***The final output should only contain this number, nothing else.***
        """
    )
    
    # Run the agent with the prompt
    response = agent.chat("""
        You are a financial risk analyst. Your task is to analyze a 10-K financial document to identify risks and assign a risk score (1-5, with 5 being the highest risk. It can be a float).
        Use the 'document_retrieval_tool' to retrieve information. For example, you might query:
        - "What company are in these document for and what is the date of the filing?"
        - "List all risk factors in the document."
        - "Summarize key financial, legal, and compliance risks in the document."
        Assign a Risk score between 1 through 5, with 5 being the highest risk. ***The final output should only contain this number, nothing else.***
    """)
    
    # Return the response
    return response.response


# Process each file in uploaded_files and save results to a list
uploaded_files = [
    '/Users/DeepCharts/Fundamental Analysis Agent/8f311d9b-787d-45db-a6ea-38335ede9d47.pdf',
    '/Users/DeepCharts/Fundamental Analysis Agent/da27d24b-9358-4b5c-a424-6da061d91836.pdf',
    '/Users/DeepCharts/Fundamental Analysis Agent/4e32b45c-a99e-4c7d-b988-4eef8377500c.pdf',
    '/Users/DeepCharts/Fundamental Analysis Agent/01262024.pdf',

]

results = []  # List to store results
for file_path in uploaded_files:
    result = process_10k_document(file_path, api_key=OPENAI_API_KEY)
    results.append({"file": file_path, "result": result})

# Print all results
for entry in results:
    print(f"Results for {entry['file']}:\n{entry['result']}\n")

Subscribe to the Deep Charts YouTube Channel for more informative AI and Machine Learning Tutorials.

Deep Charts

Discussion about this post

Ready for more?