Skip to content

Commit

Permalink
Merge pull request #121 from aws-solutions/feature/v2.2.0
Browse files Browse the repository at this point in the history
Update to version v2.2.0
  • Loading branch information
fhoueto-amz authored Sep 25, 2023
2 parents 132cf6b + 3cc89f6 commit 18d022f
Show file tree
Hide file tree
Showing 258 changed files with 50,483 additions and 136,542 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To get the version of the solution, you can look at the description of the creat
- [ ] Region: [e.g. us-east-1]
- [ ] Was the solution modified from the version published on this repository?
- [ ] If the answer to the previous question was yes, are the changes available on GitHub?
- [ ] Have you checked your [service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) for the sevices this solution uses?
- [ ] Have you checked your [service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) for the services this solution uses?
- [ ] Were there any errors in the CloudWatch Logs?

**Screenshots**
Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.2.0] - 2023-09-21

### Updated
- Migrated to AWS CDK v2
- Migrated to AWS SDK V3
- Updated Node Lambda runtime to Node 18
- Implemented NewsCatcher Locally instead of using deprecated library
- Reddit comments Ingestion - Migrated from npm snoowrap package to Python praw library. Subreddit comment ingestion lambda is now using Python runtime
- Security patches for npm packages
- Updated outdated libraries
- Operational metrics to include additional deployment attributes for Reddit ingestion and attributes indicating if particular ingestion type is enabled

### Fixed
- Youtube Search query when OR (|) expression is used in query parameter
- Reddit comment ingestion issue for highly active subreddits
- UrlLib issue in RssNewsFeed Ingestion Lambda

### Removed
- Python NewCatcher library
- npm snoowrap package

## [2.1.4] - 2023-06-07

### Updated
Expand Down
131 changes: 91 additions & 40 deletions NOTICE.txt
Original file line number Diff line number Diff line change
@@ -1,46 +1,97 @@
Discovering Hot Topics using Machine Learning
Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except
in compliance with the License. A copy of the License is located at http://www.apache.org/licenses/
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the
specific language governing permissions and limitations under the License.

**********************
THIRD PARTY COMPONENTS
**********************
This software includes third party software subject to the following copyrights:
AWS CDK - Apache-2.0
AWS SDK - Apache-2.0
AWS SDK Mock - Apache-2.0
AWS Solutions Constructs Library - Apache-2.0
boto3 - Apache-2.0
botocore - Apache-2.0
chai - MIT license
crhelper - Apache-2.0
googleapis - Apache-2.0
jest - MIT license
jmespath - MIT License
momentjs - MIT license
moto - Apache-2.0
newscatcher - MIT license
nock - MIT license
node - MIT license
openpyx - MIT license
pytest-cov - MIT license
pytest - MIT license
requests - Apache-2.0
sinon - BSD license
snoowrap - MIT license
tenacity - Apache-2.0
ts-jest - MIT license
ts-node - MIT license
twitter-lite - MIT License
typescript - Apache-2.0

@aws-cdk/aws-glue-alpha under Apache License 2.0
@aws-cdk/aws-lambda-python-alpha under Apache License 2.0
@aws-cdk/aws-servicecatalogappregistry-alpha under Apache License 2.0
@aws-solutions-constructs/aws-eventbridge-lambda under Apache License 2.0
@aws-solutions-constructs/aws-kinesisfirehose-s3 under Apache License 2.0
@aws-solutions-constructs/aws-kinesisstreams-lambda under Apache License 2.0
@aws-solutions-constructs/aws-lambda-dynamodb under Apache License 2.0
@aws-solutions-constructs/aws-lambda-s3 under Apache License 2.0
@aws-solutions-constructs/aws-lambda-stepfunctions under Apache License 2.0
@aws-solutions-constructs/aws-sqs-lambda under Apache License 2.0
@aws-solutions-constructs/core -Apache License 2.0
attrs under MIT License
aws-cdk-lib under Apache License 2.0
aws-sdk-client-mock under MIT License
aws-solutions-constructs under Apache License 2.0
boolean.py under BSD-2-Clause
boto3 under Apache License 2.0
botocore under Apache License 2.0
cachetools under MIT License
cdk-nag under Apache License 2.0
cffi under MIT License
chai under MIT license
constructs under Apache License 2.0
coverage under Apache License 2.0
crhelper under Apache License 2.0
crypto under ISC
cryptography under Apache Software License, BSD License (Apache License 2.0 OR BSD-3-Clause)
et-xmlfile under MIT License
exceptiongroup under MIT License
feedparser under BSD License (BSD-2-Clause)
googleapis under Apache License 2.0
googleapis-common-protos under Apache License 2.0
google-api-core under Apache 2.0
google-api-python-client under Apache 2.0
google-auth under Apache 2.0
google-auth-httplib2 under Apache 2.0
httplib2 under MIT License
iniconfig under MIT License
jest under MIT license
jinja under BSD License (BSD-3-Clause)
Jinja2 under BSD License (BSD-3-Clause)
jmespath under MIT License
license-expression under Apache License 2.0
MarkupSafe under BSD License (BSD-3-Clause)
momentjs under MIT license
moto under Apache License 2.0
newscatcher under MIT license
nock under MIT license
node under MIT license
openpyx under MIT license
openpyxl under MIT License
pluggy under MIT License
pytest-cov under MIT license
pytest under MIT license
praw under BSD License (BSD-2-Clause)
prawcore under BSD License
protobuf under BSD License (BSD-3-Clause)
pyasn1 under BSD License (BSD-2-Clause)
pyasn1-modules under BSD License
pycparser under BSD License
python-dateutil under Apache License 2.0, BSD License
requests under Apache License 2.0
requests-file under Apache 2.0
responses under Apache 2.0
rsa under Apache License 2.0
s3transfer under Apache 2.0
sgmllib3k under BSD License
sinon under BSD license
source-map-support under MIT License
tenacity under Apache License 2.0
tldextract under BSD License (BSD-3-Clause)
tomli under MIT License
types-PyYAML under Apache License 2.0
ts-jest under MIT license
ts-node under MIT license
twitter-lite under MIT License
@types/uuid under MIT License
typescript under Apache License 2.0
update-checker under BSD License
uritemplate under Apache Software License, BSD License (BSD 3-Clause License or Apache License, Version 2.0)
websocket-client under Apache License 2.0
Werkzeug under BSD License
xmltodict under MIT License

45 changes: 25 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ This solution deploys an AWS CloudFormation template to automate data ingestion
- Reddit (comments from subreddits of interest)
- custom data in JSON or XLSX format

**Note**: Twitter ingestion is temporarily disabled starting release v2.2.0 as Twitter has retired v1 APIs.

This solution uses pre-trained machine learning (ML) models from Amazon Comprehend, Amazon Translate, and Amazon Rekognition to provide these benefits:

- **Detecting dominant topics using topic modeling**-identifies the terms that collectively form a topic.
Expand All @@ -19,7 +21,7 @@ This solution uses pre-trained machine learning (ML) models from Amazon Comprehe

The solution can be customized to aggregate other social media platforms and internal enterprise systems. The default CloudFormation deployment sets up custom ingestion configuration with parameters and an Amazon Simple Storage Service (Amazon S3) bucket to allow Amazon Transcribe Call Analytics output to be processed for natural language processing (NLP) analysis.

With minimal configuration changes in the custom ingestion functionality, this solution can ingest data from both internal systems and external data sources, such as transcriptions from call center calls, product reviews, movie reviews, and community chat forums including Twitch and Discord. This is done by exporting the custom data in JSON or XLSX format from the respective platforms and then uploading it to an Amazon Simple Storage Service (Amazon S3) bucket that is created when deploying this solution. More details on how to customize this feature, please refer Customizing Amazon Amazon S3 ingestion.
With minimal configuration changes in the custom ingestion functionality, this solution can ingest data from both internal systems and external data sources, such as transcriptions from call center calls, product reviews, movie reviews, and community chat forums including Twitch and Discord. This is done by exporting the custom data in JSON or XLSX format from the respective platforms and then uploading it to an Amazon Simple Storage Service (Amazon S3) bucket that is created when deploying this solution. More details on how to customize this feature, please refer [Customizing Amazon S3 ingestion](https://docs.aws.amazon.com/solutions/latest/discovering-hot-topics-using-machine-learning/s3-ingestion.html).

For a detailed solution deployment guide, refer to [Discovering Hot Topics using Machine Learning](https://aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning)

Expand Down Expand Up @@ -67,7 +69,7 @@ After you deploy the solution, use the included Amazon QuickSight dashboard to v
- aws-kinesisstreams-lambda
- aws-lambda-dynamodb
- aws-lambda-s3
- aws-lambda-step-function
- aws-lambda-stepfunctions
- aws-sqs-lambda

## Deployment
Expand All @@ -91,7 +93,8 @@ The solution is deployed using a CloudFormation template with a lambda backed cu
│ ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Streams]
│ ├── ingestion-custom [lambda function that reads files from Amazon S3 bucket and pushes data to Amazon Kinesis Data Streams]
│ ├── ingestion-producer [lambda function that makes Twitter API call and pushes data to Amazon Kinesis Data Stream]
│ ├── ingestion-reddit [lambda function that makes Reddit API call to retrieve comments from subreddits of interest and pushes data to Amazon Kinesis Data Stream]
│ ├── ingestion-publish-subreddit [lambda function that publishes Eventbridge (CloudWatch) events for the subreddits to ingest information from. This event triggers ingestion_reddit_comments lambda which retrieves comments from subreddit]
│ ├── ingestion_reddit_comments [lambda function that makes Reddit API call to retrieve comments from subreddits of interest and pushes data to Amazon Kinesis Data Stream]
│ ├── ingestion-youtube [lambda function that ingests comments from YouTube videos and pushes data to Amazon Kinesis Data Streams]
│ ├── integration [lambda function that publishes inference outputs to Amazon Events Bridge]
│ ├── layers [lambda layer function library for Node and Python layers]
Expand Down Expand Up @@ -143,30 +146,26 @@ chmod +x ./run-all-tests.sh
./run-all-tests.sh
```

- Configure the bucket name of your target Amazon S3 distribution bucket
- Configure environment variables for build

Configure below environment variables. Note: The values provided below are example values only.
```
export DIST_OUTPUT_BUCKET=my-bucket-name
export VERSION=my-version
export DIST_OUTPUT_BUCKET=my-bucket-name #This is the global name of the distribution. For the bucket name, the AWS Region is added to the global name (example: 'my-bucket-name-us-east-1') to create a regional bucket. The lambda artifact should be uploaded to the regional buckets for the CloudFormation template to pick it up for deployment.
export SOLUTION_NAME=discovering-hot-topics-using-machine-learning #The name of this solution
export VERSION=my-version #version number for the customized code
export CF_TEMPLATE_BUCKET_NAME=my-cf-template-bucket-name #The name of the S3 bucket where the CloudFormation templates should be uploaded
export QS_TEMPLATE_ACCOUNT=aws-account-id #The AWS account Id from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
export DIST_QUICKSIGHT_NAMESPACE=my-quicksight-namespace #Quicksight namespace
```

- Now build the distributable:
- Run below commands to build the distributable:

```
cd <rootDir>/deployment
chmod +x ./build-s3-dist.sh
./build-s3-dist.sh $DIST_OUTPUT_BUCKET $SOLUTION_NAME $VERSION $CF_TEMPLATE_BUCKET_NAME QS_TEMPLATE_ACCOUNT
```
./build-s3-dist.sh $DIST_OUTPUT_BUCKET $SOLUTION_NAME $VERSION $CF_TEMPLATE_BUCKET_NAME $QS_TEMPLATE_ACCOUNT $DIST_QUICKSIGHT_NAMESPACE
- Parameter details

```
$DIST_OUTPUT_BUCKET - This is the global name of the distribution. For the bucket name, the AWS Region is added to the global name (example: 'my-bucket-name-us-east-1') to create a regional bucket. The lambda artifact should be uploaded to the regional buckets for the CloudFormation template to pick it up for deployment.
$SOLUTION_NAME - The name of This solution (example: discovering-hot-topics-using-machine-learning)
$VERSION - The version number of the change
$CF_TEMPLATE_BUCKET_NAME - The name of the S3 bucket where the CloudFormation templates should be uploaded
$QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
```

- When creating and using buckets it is recommeded to:
Expand All @@ -175,13 +174,19 @@ $QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates sh
- Ensure buckets are not public.
- Verify bucket ownership prior to uploading templates or code artifacts.

### 3. Upload deployment assets to your Amazon S3 buckets
- Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.

```
aws s3 cp ./global-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
aws s3 cp ./regional-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
aws s3 cp ./global-s3-assets/ s3://$CF_TEMPLATE_BUCKET_NAME/discovering-hot-topics-using-machine-learning/$VERSION/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
aws s3 cp ./regional-s3-assets/ s3://$DIST_OUTPUT_BUCKET-<aws_region>/discovering-hot-topics-using-machine-learning/$VERSION/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
```

### 4. Launch the CloudFormation template
- Get the link of the template uploaded to Amazon S3 bucket ($CF_TEMPLATE_BUCKET_NAME bucket from previous step)
- Deploy the solution to your account by launching a new AWS CloudFormation stack


## Collection of operational metrics

This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the [implementation guide](https://docs.aws.amazon.com/solutions/latest/discovering-hot-topics-using-machine-learning/operational-metrics.html).
Expand Down
Loading

0 comments on commit 18d022f

Please sign in to comment.