Earnings call extraction demo #5

cpfiffer · 2024-10-24T19:43:06Z

Still needs a little polish, but the idea is that the model acts as an analyst. The analyst will extract all relevant information from an earnings call, including

name
ticker
headline takeaways
financial metrics
sentiment
reasoning about firm risks (used to guide the model for later)
recommendation for buy/sell/hold
a review of correctness (are all the numbers correct?)
a flag for whether the data should be re-processed
- currently this is unused, but my thinking is that we can pass the first result to the model again and ask it to take the comments into account. Kind of a judge model.

cpfiffer · 2024-10-24T21:01:10Z

@willkurt I think this is ready for a review!

willkurt

Good start! Here are a couple of things I'd like to see changed:

I really want to see this use generate.regex to have this go straight to CSV. There are a couple of reasons for this:
- This shows off something that simply cannot be done with JSON-mode on other platforms, and can be used as a great example for near term content being created as to why structured generation is not just JSON-mode.
- The point of LLMs is to not write code, so there's no reason not to go straight to CSV
- Right now this example mimics your other demos very closely, so there's not a lot of new insight into how to think about structured gen. Our support for Pydantic models is awesome, but not the only way to use Outlines.
- It's good for you to become more familiar with using regular expressions for structured gen. At the end of the day the heart of structured generation is regular languages, and to help improve Outlines in the future, everyone on the team needs to have a deep understanding of this.
I think this example could be simpler! We just want to give enough so that the user gets a feel for how they could extend it. Additionally we're asking the model do some things here that I'm not really sure it can (do you earnestly think those are good buy/sell recommendations?). When I think of earnings call transcript extraction, I think mostly about not having to hand extract certain figures (basically being able to replicate what ycharts has quickly). So the focus should be on a demo that actually works even if that demo is smaller.
Related: we absolutely need some sort of simple evaluation for this. People are inherently suspect of LLMs and doubly so in the case of financial data. As a reader I want to see that this can earnestly replace reading the earnings transcript.
- These evals can be stupidly simple, we just need to show that this works.
- For the demo to be ready, the results need to be good, but that can be achieved by sub-setting the problems to a case that works well.
This one is optional, but it would be nice to remove the modal dependency so this demo can be easily run locally. Definitely more in the "nice to have" category, and of course depends on your own compute resources.

cpfiffer · 2024-10-25T18:33:58Z

Alright cool, thanks for the comments.

I think I agree that this could be much simpler, lots of time spent on IO.
Re: CSV/JSON, I think there's some extra considerations, but in general I agree that it would be nice to show off what JSON mode cannot do. We don't really showcase regular expression stuff as much as we could.
- One thing I was finding happening regularly was that the model was having a lot of difficulty extracting the correct figures without extra reasoning/unstructured components.
- A common case was when speakers gave absolute revenue growth in numbers instead of total revenue.
- I added a few planning/verification/understanding components which helped substantially.
- Admittedly, my experimentation of the CSV formatting was on a simpler case, and I agree that I haven't dug into how this would look in a pure CSV case. I'll tinker with it and see what you think, perhaps the number extraction problem is less of an issue that I'd imagine.
- I do need more experience on regular expressions -- happy to put time into becoming stronger there.
Re: simplicity, definitely agree. I tried to compress stuff but ultimately this is a complicated system and may not transfer to a blog post well.
- In this case, I might actually prefer to leave this code in the demos, but also add a CSV example that we boost more heavily elsewhere.
- This is a solid example -- it shows people how to design a system to extract information, add understanding, multi-step processing, reflection, and how to mix quantitative and qualitative information without having to write weird and complicated agent systems.
- We may also wish to pick a few figures that reliably appear without pulling too many in. Lots of earnings calls do not mention headline numbers -- even revenue is sparingly mentioned. I think we can demonstrate this through evals as you mentioned.
- Re: analyst recommendations -- no, I don't believe that analyst recommendations are any good. As someone who has studied analyst recommendations extensively, none of the real ones are particularly good either. I'm fine with pulling them out but I do think it shows how to ask models to perform reasoning about a company's statements and then conclude with a single qualitative point.
Re: being too similar to previous works. Yes, I agree. I think I'm torn between thinking of repeated programming paradigms to hammer home a point, but there is also a really strong case to be made that we want more of a diverse portfolio of content. I'll happily do a CSV example but it might be nice to clarify for ourselves the value of diversity vs. specificity in programming paradigms.
Re: evals, yes, absolutely agree.
- I can hand-code some examples to show how often the models capture the correct value.
- Maybe something like the percentage of times that the model matches the true value.
Re: Modal, I included both for the GPU-rich and for the GPU-poor. In general I think we need to be sensitive to both groups of people. I hope that it didn't muddy the waters too much but I can see how it might be unnecessarily complicated. I'll remove it and consolidate the code so it's easier to copy/paste -- let the user decide how they choose to run this.

My sense is that the desired demo is not this demo, so we should decide how much of this to save elsewhere. I can use the same data stuff but most of this is not particularly applicable to the CSV example, with the exception of a few parts related to data processing.

I do think a CSV example is a great idea and I'm happy to pivot towards that, so let me try a few things and see what I can do. I'll open a separate, simpler PR in case we want to do anything with this example as it stands.

cpfiffer · 2024-10-25T23:11:50Z

Working example of CSV extraction, though maybe a bit verbose.

csv_pattern = r"company_name,company_ticker,year,quarter,quarterly_revenue,quarterly_revenue_growth\n(\w+?),([A-Z]+?),(\d{4}),(q[1-4]),([0-9]+?){1},(\d*|null){1}"

Which yields

company_name,company_ticker,year,quarter,quarterly_revenue,quarterly_revenue_growth
Redfin,RDFN,2022,q1,597,123

Unfortunately, that 123 figure is not correct, since it refers to YoY growth and not quarterly growth. Been tricky to get that go go way. I tried adding a "null" field but that seems to be having difficulty.

Across all firms, this is

company_name,company_ticker,year,quarter,quarterly_revenue,quarterly_revenue_growth
AAPL,AAPL,2022,q1,123900,11
NVDA,NVDA,2023,q3,593,12
RDFN,RDFN,2022,q1,59,1230000000

None of these are quite correct. There's name problems (first column is tickers, not names), the revenues are incorrect, and revenue growth is either insane or the YoY growth rate rather than the quarterly growth rate.

Here's an example of use:

import outlines

language_model = "microsoft/Phi-3-mini-128k-instruct"

model = outlines.models.transformers(
    language_model,
    device="cuda"
)

from transformers import AutoTokenizer

# Load the tokenizer
TOKENIZER = AutoTokenizer.from_pretrained(language_model)

def to_prompt(user_prompt="", system_prompt=""):

    chat = []

    if len(system_prompt) > 0:
        chat.append({'role':'system', 'content':system_prompt})

    if len(user_prompt) > 0:
        chat.append({'role':'user', 'content':user_prompt})

    tokenized_chat = TOKENIZER.apply_chat_template(
        chat,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    )

    decoded_chat = TOKENIZER.decode(tokenized_chat[0])
    return decoded_chat

# Example
to_prompt(
    user_prompt="Please extract the data from the following text and output it in CSV format:\n\n{data}\n\nYou should have columns for name, age, and occupation.",
    system_prompt="You extract data from text and output it in CSV format."
)

data = """
Thank you, Tejas, and good afternoon. Today, we are proud to announce Apple's biggest quarter ever. Through the busy holiday season, we set an all-time revenue record of nearly $124 billion, up 11% from last year and better than we had expected at the beginning of the quarter. And we are pleased to see that our active installed base of devices is now at a new record with more than 1.8 billion devices.

We set all-time records for both developed and emerging markets and saw revenue growth across all of our product categories, except for iPad, which we said would be supply constrained. As expected, in the aggregate, we experienced supply constraints that were higher than the September quarter. Before I discuss our results in greater detail, I want to first acknowledge the toll that COVID continues to have on communities around the world. In many places, case counts are higher and health systems more strained than at any point throughout the pandemic.

On behalf of all of us at Apple, I want to extend our deep gratitude to the scientists, doctors, nurses, and so many others on the front lines of combating COVID-19. This is our eighth quarter reporting results in the shadow of the pandemic. And while I can't say it gets any easier, I can say I'm incredibly proud of the way our teams have come together and continue to innovate on behalf of our customers. A few weeks ago, we marked the 15th anniversary of the day Steve revealed iPhone to the world.
"""

def prompt_for_csv(data: str) -> str:
    return to_prompt(
        system_prompt="""
        You extract data from quarterly earnings call transcripts and output it in CSV format.

        The CSV should have columns for company name, company ticker, revenue, and revenue growth.
        """,
        user_prompt=f"""
        Please extract the data from the following text and output it in CSV
        format:\n\n{data}\n\n

        You should have columns for company name, company_ticker, revenue, and revenue growth.

        Revenue should be in units of millions of dollars, i.e.

        - 92,000,000 means 92 million dollars
        - 114 billion should be 114,000 million dollars

        Revenue growth should be in units of percentage. Extract the quarterly growth, not the year-over-year growth.

        When a value is not mentioned in the transcript, use "null" for that value. For example if the transcript
        says "year over year revenue was up 10%" and quarterly revenue growth is not mentioned, then the
        revenue growth should be set to null, as it is not mentioned in the transcript.

        Be exact as possible. Use what is mentioned in the transcript.
        """
    )

print(prompt_for_csv(data))

csv_pattern = r"company_name,company_ticker,year,quarter,quarterly_revenue,quarterly_revenue_growth\n(\w+?),([A-Z]+?),(\d{4}),(q[1-4]),([0-9]+?){1},(\d*|null){1}"

csv_extractor = outlines.generate.regex(
    model,
    csv_pattern,
    sampler=outlines.samplers.multinomial()
)

def extract_csv(data: str) -> str:
    result = csv_extractor(prompt_for_csv(data), max_tokens=100)
    return result

print(extract_csv(data))

cpfiffer · 2024-10-28T15:32:21Z

Question: I think the approach here in general is kind of clunky. The class-based approach I have above works well for extracting a large amount of complicated, possible optional fields of different units.

Directly translating to the CSV approach to extract headline numbers like revenue, growth rate, etc. doesn't really showcase how to handle this, largely because we would usually only get one row from an earnings transcript.

Alternatives:

Extracting analyst name and firm
Extracting Q&A questions and answers

Could also try just extracting all available metrics in a long format using columns performance_measure and performance_value maybe with a set of valid measures like "revenue (in billions of $)" or "revenue growth (YoY %)"

cpfiffer added 7 commits October 23, 2024 16:56

First serious pass

3700fce

correct file naming convention, add default transcripts

92f3f05

fix JSON to CSV extraction

2aecd15

fix CUDA use for modal

867771e

Add requirements

149f0b9

clean up transcript interface

4bb2e9c

Add a detailed readme

a827f75

cpfiffer marked this pull request as ready for review October 24, 2024 21:01

cpfiffer assigned willkurt Oct 24, 2024

rlouf assigned cpfiffer and unassigned willkurt Oct 25, 2024

rlouf requested a review from willkurt October 25, 2024 07:36

willkurt requested changes Oct 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Earnings call extraction demo #5

Earnings call extraction demo #5

cpfiffer commented Oct 24, 2024

cpfiffer commented Oct 24, 2024

willkurt left a comment

cpfiffer commented Oct 25, 2024

cpfiffer commented Oct 25, 2024 •

edited

Loading

cpfiffer commented Oct 28, 2024

Earnings call extraction demo #5

Are you sure you want to change the base?

Earnings call extraction demo #5

Conversation

cpfiffer commented Oct 24, 2024

cpfiffer commented Oct 24, 2024

willkurt left a comment

Choose a reason for hiding this comment

cpfiffer commented Oct 25, 2024

cpfiffer commented Oct 25, 2024 • edited Loading

cpfiffer commented Oct 28, 2024

cpfiffer commented Oct 25, 2024 •

edited

Loading