Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEOS-1593: Business name transformer #2892

Merged
merged 2 commits into from
Nov 3, 2024
Merged

Conversation

evisdrenova
Copy link
Contributor

@evisdrenova evisdrenova commented Oct 31, 2024

This feature adds in support to randomly generate a realistic looking business name with a new transformer called Generate Business Name. We pre-generate like 1.6M business names as part of the go generate script and store them in the /transformers/data-sets/datasets directory.

If we're concerned about that .txt file being too big, we can reduce it but I took the approach that business names are more likely to be required to be unique compared to first or last names, so we should have more than less. But if we want to knock it down to like 100k like the others, happy to update it.

Demo
https://www.loom.com/share/b7c01ba19b8c433e9bfca8f7b667ab57

Edit:

Did some benchmarking - here's what I'm seeing.

I checked the build time for each generator in the generators.go file as well as mem + cpu usage.

Overall, it doesn't seem to spike memory or cpu, but does take about 4 seconds to run the businessName generator.
@nickzelei

Generator Avg Time (s) Avg Memory (MB) Avg CPU (%) Time Range (s)
emaildomains 0.200 50.75 13.1 0.000 - 1.000
first_names 0.400 50.62 12.1 0.000 - 1.000
last_names 0.600 50.59 12.8 0.000 - 1.000
business_names 3.800 50.83 13.9 3.000 - 4.000

@evisdrenova evisdrenova added the Feature Created by Linear-GitHub Sync label Oct 31, 2024
Copy link

linear bot commented Oct 31, 2024

Copy link

vercel bot commented Oct 31, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
neosync-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 31, 2024 10:33pm

Copy link

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedOct 31, 2024, 10:33 PM

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 42.10526% with 77 lines in your changes missing coverage. Please review.

Project coverage is 38.71%. Comparing base (b28e5a6) to head (8bba519).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...benthos/transformers/gen_generate_business_name.go 22.22% 41 Missing and 1 partial ⚠️
...pkg/benthos/transformers/generate_business_name.go 67.30% 11 Missing and 6 partials ⚠️
backend/sql/postgresql/models/transformers.go 0.00% 8 Missing ⚠️
...nal/benthos/benthos-builder/builders/processors.go 0.00% 6 Missing ⚠️
...kg/benthos/transformers/transformer_initializer.go 75.00% 2 Missing and 1 partial ⚠️
...g/benthos/transformers/gen_neosync_transformers.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2892      +/-   ##
==========================================
+ Coverage   38.69%   38.71%   +0.01%     
==========================================
  Files         315      317       +2     
  Lines       36140    36273     +133     
==========================================
+ Hits        13985    14042      +57     
- Misses      20462    20531      +69     
- Partials     1693     1700       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@nickzelei nickzelei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! That is a big add 3 million additions.

Did you find this to affect build times at all?

@@ -68,7 +68,7 @@ func main() {

for _, line := range lines {
trimmedLine := strings.TrimSpace(line)
if len(trimmedLine) == 0 || strings.ContainsAny(trimmedLine, " -_") {
if len(trimmedLine) == 0 || strings.ContainsAny(trimmedLine, "-_") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm guessing there was a reason why I originally wrote this to exclude first/last names with spaces. But who knows if that is still relevant. Part of me wants to expose any bugs removing the space here would cause...

@evisdrenova evisdrenova merged commit 9fcf422 into main Nov 3, 2024
22 checks passed
@evisdrenova evisdrenova deleted the businessNameTransformer branch November 3, 2024 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants