- Deadline: February 22, 2023
- Track: Machine Learning, Data Science, and Ethics in AI
- Abstract: ca. 100 words
- Description
- Word limit: ca. 500 words
- Content
- Software of interest
- Tools or techniques for more effective computing
- how scientific Python was applied to solve a research problem
- Structure: background/motivation, methods, results, and conclusion
- Links
- Websites
- IEEE SOCC 2022 paper: publication for the Part 1
- Scipy 2019: background information
- Source code repositories: GitHub
- Figures
- Part 1
- Changes in no. features over time
- Flowchart of the train/serving pipelines
- Performance comparison: speed and model performance
- Part 2
- Benchmark results for data tuning
- Fail signature comparison between existing and the data-tuned
- Part 1
- Evidence of public speaking ability
- Austin Python meetup talk on YouTube
- Websites
- Audience (a broad range of people)
- Takeaways
- Links to source code, articles, blog posts, or other writing that adds context to the presentation
- Previous talk, tutorial, or other presentation information
- Part 1: data preprocessing pipeline for non-stationary data
- Part 2: data tuning
- Synthetic and non-proprietary
- High dimensionality
- High heterogeneity
- High sparsity
- Type changes over time
- No. features changes over time
- Timestamps
- Object data types including lists
- Arbitrary feature names (represent that it's difficult to understand the meaning of the features)
- Comparing type inference between pandas and numpy
- Demo of the entire flow
- Data tuning
- Caching and memoization
- Environment
- Modules
- Schema inference
- Resolving mismatched during post-deployment
- Prefect flow