柏拉圖數據智能。
垂直搜索和人工智能。

使用 Amazon Bedrock 和 Amazon Neptune 發現非結構化財務資料中的隱藏聯繫 |亞馬遜網路服務

日期:

In asset management, portfolio managers need to closely monitor companies in their investment universe to identify risks and opportunities, and guide investment decisions. Tracking direct events like earnings reports or credit downgrades is straightforward—you can set up alerts to notify managers of news containing company names. However, detecting second and third-order impacts arising from events at suppliers, customers, partners, or other entities in a company’s ecosystem is challenging.

For example, a supply chain disruption at a key vendor would likely negatively impact downstream manufacturers. Or the loss of a top customer for a major client poses a demand risk for the supplier. Very often, such events fail to make headlines featuring the impacted company directly, but are still important to pay attention to. In this post, we demonstrate an automated solution combining knowledge graphs and 生成人工智慧(AI) to surface such risks by cross-referencing relationship maps with real-time news.

Broadly, this entails two steps: First, building the intricate relationships between companies (customers, suppliers, directors) into a knowledge graph. Second, using this graph database along with generative AI to detect second and third-order impacts from news events. For instance, this solution can highlight that delays at a parts supplier may disrupt production for downstream auto manufacturers in a portfolio though none are directly referenced.

With AWS, you can deploy this solution in a serverless, scalable, and fully event-driven architecture. This post demonstrates a proof of concept built on two key AWS services well suited for graph knowledge representation and natural language processing: 亞馬遜海王星亞馬遜基岩. Neptune is a fast, reliable, fully managed graph database service that makes it straightforward to build and run applications that work with highly connected datasets. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Overall, this prototype demonstrates the art of possible with knowledge graphs and generative AI—deriving signals by connecting disparate dots. The takeaway for investment professionals is the ability to stay on top of developments closer to the signal while avoiding noise.

Build the knowledge graph

The first step in this solution is building a knowledge graph, and a valuable yet often overlooked data source for knowledge graphs is company annual reports. Because official corporate publications undergo scrutiny before release, the information they contain is likely to be accurate and reliable. However, annual reports are written in an unstructured format meant for human reading rather than machine consumption. To unlock their potential, you need a way to systematically extract and structure the wealth of facts and relationships they contain.

With generative AI services like Amazon Bedrock, you now have the capability to automate this process. You can take an annual report and trigger a processing pipeline to ingest the report, break it down into smaller chunks, and apply natural language understanding to pull out salient entities and relationships.

For example, a sentence stating that “[Company A] expanded its European electric delivery fleet with an order for 1,800 electric vans from [Company B]” would allow Amazon Bedrock to identify the following:

  • [Company A] as a customer
  • [Company B] as a supplier
  • A supplier relationship between [Company A] and [Company B]
  • Relationship details of “supplier of electric delivery vans”

Extracting such structured data from unstructured documents requires providing carefully crafted prompts to large language models (LLMs) so they can analyze text to pull out entities like companies and people, as well as relationships such as customers, suppliers, and more. The prompts contain clear instructions on what to look out for and the structure to return the data in. By repeating this process across the entire annual report, you can extract the relevant entities and relationships to construct a rich knowledge graph.

However, before committing the extracted information to the knowledge graph, you need to first disambiguate the entities. For instance, there may already be another ‘[Company A]’ entity in the knowledge graph, but it could represent a different organization with the same name. Amazon Bedrock can reason and compare the attributes such as business focus area, industry, and revenue-generating industries and relationships to other entities to determine if the two entities are actually distinct. This prevents inaccurately merging unrelated companies into a single entity.

After disambiguation is complete, you can reliably add new entities and relationships into your Neptune knowledge graph, enriching it with the facts extracted from annual reports. Over time, the ingestion of reliable data and integration of more reliable data sources will help build a comprehensive knowledge graph that can support revealing insights through graph queries and analytics.

This automation enabled by generative AI makes it feasible to process thousands of annual reports and unlocks an invaluable asset for knowledge graph curation that would otherwise go untapped due to the prohibitively high manual effort needed.

The following screenshot shows an example of the visual exploration that’s possible in a Neptune graph database using the Graph Explorer 工具。

Process news articles

The next step of the solution is automatically enriching portfolio managers’ news feeds and highlighting articles relevant to their interests and investments. For the news feed, portfolio managers can subscribe to any third-party news provider through AWS數據交換 or another news API of their choice.

When a news article enters the system, an ingestion pipeline is invoked to process the content. Using techniques similar to the processing of annual reports, Amazon Bedrock is used to extract entities, attributes, and relationships from the news article, which are then used to disambiguate against the knowledge graph to identify the corresponding entity in the knowledge graph.

The knowledge graph contains connections between companies and people, and by linking article entities to existing nodes, you can identify if any subjects are within two hops of the companies that the portfolio manager has invested in or is interested in. Finding such a connection indicates the article may be relevant to the portfolio manager, and because the underlying data is represented in a knowledge graph, it can be visualized to help the portfolio manager understand why and how this context is relevant. In addition to identifying connections to the portfolio, you can also use Amazon Bedrock to perform sentiment analysis on the entities referenced.

The final output is an enriched news feed surfacing articles likely to impact the portfolio manager’s areas of interest and investments.

解決方案概述

The overall architecture of the solution looks like the following diagram.

該工作流程包括以下步驟:

  1. A user uploads official reports (in PDF format) to an 亞馬遜簡單存儲服務 (Amazon S3) bucket. The reports should be officially published reports to minimize the inclusion of inaccurate data into your knowledge graph (as opposed to news and tabloids).
  2. The S3 event notification invokes an AWS Lambda function, which sends the S3 bucket and file name to an Amazon Simple Queue服務 (Amazon SQS) queue. The First-In-First-Out (FIFO) queue makes sure that the report ingestion process is performed sequentially to reduce the likelihood of introducing duplicate data into your knowledge graph.
  3. An 亞馬遜EventBridge time-based event runs every minute to start the run of an AWS步驟功能 state machine asynchronously.
  4. The Step Functions state machine runs through a series of tasks to process the uploaded document by extracting key information and inserting it into your knowledge graph:
    1. Receive the queue message from Amazon SQS.
    2. Download the PDF report file from Amazon S3, split it into multiple smaller text chunks (approximately 1,000 words) for processing, and store the text chunks in 亞馬遜DynamoDB.
    3. Use Anthropic’s Claude v3 Sonnet on Amazon Bedrock to process the first few text chunks to determine the main entity that the report is referring to, together with relevant attributes (such as industry).
    4. Retrieve the text chunks from DynamoDB and for each text chunk, invoke a Lambda function to extract out entities (such as company or person), and its relationship (customer, supplier, partner, competitor, or director) to the main entity using Amazon Bedrock.
    5. Consolidate all extracted information.
    6. Filter out noise and irrelevant entities (for example, generic terms such as “consumers”) using Amazon Bedrock.
    7. Use Amazon Bedrock to perform disambiguation by reasoning using the extracted information against the list of similar entities from the knowledge graph. If the entity does not exist, insert it. Otherwise, use the entity that already exists in the knowledge graph. Insert all relationships extracted.
    8. Clean up by deleting the SQS queue message and the S3 file.
  5. A user accesses a React-based web application to view the news articles that are supplemented with the entity, sentiment, and connection path information.
  6. Using the web application, the user specifies the number of hops (default N=2) on the connection path to monitor.
  7. Using the web application, the user specifies the list of entities to track.
  8. To generate fictional news, the user chooses Generate Sample News to generate 10 sample financial news articles with random content to be fed into the news ingestion process. Content is generated using Amazon Bedrock and is purely fictional.
  9. To download actual news, the user chooses Download Latest News to download the top news happening today (powered by NewsAPI.org).
  10. The news file (TXT format) is uploaded to an S3 bucket. Steps 8 and 9 upload news to the S3 bucket automatically, but you can also build integrations to your preferred news provider such as AWS Data Exchange or any third-party news provider to drop news articles as files into the S3 bucket. News data file content should be formatted as <date>{dd mmm yyyy}</date><title>{title}</title><text>{news content}</text>.
  11. The S3 event notification sends the S3 bucket or file name to Amazon SQS (standard), which invokes multiple Lambda functions to process the news data in parallel:
    1. Use Amazon Bedrock to extract entities mentioned in the news together with any related information, relationships, and sentiment of the mentioned entity.
    2. Check against the knowledge graph and use Amazon Bedrock to perform disambiguation by reasoning using the available information from the news and from within the knowledge graph to identify the corresponding entity.
    3. After the entity has been located, search for and return any connection paths connecting to entities marked with INTERESTED=YES in the knowledge graph that are within N=2 hops away.
  12. The web application auto refreshes every 1 second to pull out the latest set of processed news to display on the web application.

Deploy the prototype

You can deploy the prototype solution and start experimenting yourself. The prototype is available from GitHub上 and includes details on the following:

  • Deployment prerequisites
  • 部署步驟
  • Cleanup steps

總結

This post demonstrated a proof of concept solution to help portfolio managers detect second- and third-order risks from news events, without direct references to companies they track. By combining a knowledge graph of intricate company relationships with real-time news analysis using generative AI, downstream impacts can be highlighted, such as production delays from supplier hiccups.

Although it’s only a prototype, this solution shows the promise of knowledge graphs and language models to connect dots and derive signals from noise. These technologies can aid investment professionals by revealing risks faster through relationship mappings and reasoning. Overall, this is a promising application of graph databases and AI that warrants exploration to augment investment analysis and decision-making.

If this example of generative AI in financial services is of interest to your business, or you have a similar idea, reach out to your AWS account manager, and we will be delighted to explore further with you.


關於作者

黃軒 是 AWS 的高級解決方案架構師,居住在新加坡。他與主要金融機構合作,在雲端設計和建構安全、可擴展且高度可用的解決方案。工作之餘,Xan 的大部分空閒時間都與家人在一起,並受他 3 歲女兒的指揮。您可以在以下位置找到 Xan LinkedIn.

現貨圖片

最新情報

現貨圖片

和我們線上諮詢

你好呀!我怎麼幫你?