AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之7 - 开源项目postgres_finetuning_kb, 如何生成 “微调(fine-tuning)大语言模型(LLM)” 的问题素材?
digoal
2024-07-30
PostgreSQL , PolarDB , DuckDB , AI , macOS , ollama , docker , 数字人 , 大模型 , 微调 , prompt tunning , fine tunning , 向量数据库 , 全文检索 , tf-idf , 权重 , RAG
系列文章:
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之7 - 如何生成 “微调(fine-tuning)大语言模型(LLM)” 的问题素材?》
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之6 - 科普 : 大模型到底能干什么? 如何选型? 专业术语? 资源消耗?》
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之5 - 在 Apple Silicon Mac 上微调(fine-tuning)大型语言模型(LLM) 并发布GGUF 》
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之4 - RAG 自动提示微调(prompt tuning)》
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之3 - 微调后, 我的数字人变聪明了 》
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之2 - 我的数字人: 4000余篇github blog文章投喂大模型中》
- 《AI大模型+全文检索+tf-idf+向量数据库+我的文章 系列之1 - 低配macbook成功跑AI大模型
LLama3:8b
, 感谢ollama》
为了微调模型, 让模型更适合解决PostgreSQL相关的问题, 需要收集高质量的问题和答案. 为此我发起了一个开源项目(postgres_finetuning_kb), 将来微调出来的模型有多厉害, 能不能成为能打的日常小助手就靠大家了, 希望大家一起来贡献.
https://github.com/digoal/postgres_finetuning_kb
我的思路是这样的: 先找到PG相关的职业有哪些? 要成为这些职业的佼佼者, 需要掌握哪些知识点? 要考核是否已经掌握了某个知识点, 可以提出哪些问题?
大家的任务是让模型来生成问题和答案, 人工进行问题和答案的校对, 最终产出高质量的问题和答案.
postgres_finetuning_kb 开源项目 目录结构:
job_$name/skill_$name/qa_$level_$number.md
job_职业名称/skill_知识点名称/skill_难度(l|m|h)_问题编号(001).md
md文件内容格式:
## 职业
...
## 知识点
...
## 问题难度(l|m|h)
...
## 关联问题
...
## 问题
...
## 答案
...
## 校对信息
校对人1, 校对时间
...
https://www.promptingguide.ai/zh/introduction/settings
市面上有哪些PostgreSQL开源数据库相关的职业?
The world of databases offers a wide array of exciting career paths! Here's a breakdown of some popular database-related jobs:
Development & Engineering:
- Database Developer: Designs, develops, and implements relational databases (like PostgreSQL) or NoSQL databases (e.g., MongoDB). They write SQL queries, create stored procedures, optimize
performance, and ensure data integrity. - Data Engineer: Focuses on building and maintaining the systems that collect, process, and store large datasets. This involves designing data pipelines, implementing ETL processes, and
managing big data infrastructure. - Backend Developer: Often works with databases as part of web or application development. They design database schemas, write code to interact with databases (using APIs or libraries), and
ensure secure data handling.
Administration & Operations:
- Database Administrator (DBA): Responsible for the overall health, performance, security, and availability of database systems. Tasks include installation, configuration, backup/recovery,
monitoring, troubleshooting, user management, and performance optimization. - DevOps Engineer (with Database Focus): Combines development and operations skills to automate database deployments, infrastructure provisioning, and monitoring. They use tools like Ansible,
Terraform, and Kubernetes to manage databases at scale.
Data Science & Analytics:
- Data Analyst: Collects, cleans, analyzes, and interprets data from various sources, including databases. They create reports, visualizations, and dashboards to extract insights and support
decision-making. - Data Scientist: Uses advanced statistical techniques, machine learning algorithms, and database knowledge to build predictive models, identify patterns in data, and solve complex problems.
Specialized Roles:
- BI Developer (Business Intelligence): Designs and develops dashboards, reports, and analytical tools using data from databases to provide business insights.
- Cloud Database Administrator: Manages cloud-based databases on platforms like AWS, Azure, or GCP. Requires expertise in cloud services and database technologies offered by these providers.
Emerging Fields:
- NoSQL Database Specialist: Focuses on non-relational database systems (e.g., MongoDB, Cassandra).
- Data Security Analyst: Specializes in securing databases from threats, implementing access controls, auditing activities, and responding to security incidents.
This is not an exhaustive list, but it highlights the diverse range of career paths available in the dynamic world of databases!
要成为一名优秀的xxx, 需要掌握哪些知识点?
要成为一名优秀的Database Developer需要掌握哪些知识点
To become a stellar Database Developer, you'll need a blend of technical skills and soft skills. Here's a breakdown of essential knowledge points:
1. Relational Database Fundamentals:
- SQL (Structured Query Language): Mastery is crucial! Understand SELECT, INSERT, UPDATE, DELETE, JOINs (various types), subqueries, aggregate functions, window functions, stored procedures,
triggers, views. - Database Design: Learn about normalization (1NF, 2NF, 3NF), ER diagrams, data modeling principles, choosing appropriate data types, and designing efficient schemas.
2. Specific Database Systems:
- PostgreSQL Expertise: Become deeply familiar with PostgreSQL's features, extensions (e.g., PostGIS for geospatial data, pgcrypto for encryption), functions, performance tuning techniques.
- Other RDBMS Familiarity: Gain some understanding of MySQL, Oracle, SQL Server— knowing the nuances of different systems is valuable in a diverse tech landscape.
3. Programming & Integration:
- Backend Languages: Choose one or two popular languages used for database interaction (Python, Java, C#, Node.js). Understand how to write code that efficiently interacts with databases
through APIs or libraries. - Object-Oriented Programming (OOP): While not strictly database-specific, OOP principles will help you design well-structured code and manage complex database interactions effectively.
4. Data Modeling & Analysis:
- ETL Processes: Understand how to extract, transform, and load data into databases from various sources using tools like Apache NiFi or SQL Server Integration Services (SSIS).
- Basic Data Analytics: Familiarize yourself with concepts like querying for trends, calculating metrics, and performing simple aggregations to gain insights from the data.
5. Performance Optimization & Security:
- Query Tuning: Learn how to write efficient SQL queries, use indexes effectively, analyze query execution plans, and optimize database performance.
- Security Best Practices: Understand concepts like access control, encryption, data masking, and secure coding practices to protect sensitive information stored in databases.
6. Soft Skills for Success:
- Communication: Clearly explain technical concepts to both developers and non-technical stakeholders.
- Collaboration: Work effectively with other team members (developers, analysts, designers).
- Problem-Solving: Identify and troubleshoot database issues efficiently.
Remember: Continuous learning is vital in the ever-evolving world of databases! Stay updated on new technologies, best practices, and industry trends.
要考察PostgreSQL
xxx
职业或岗位的应聘者是否已深刻掌握xxx
知识点, 可以提出哪些连贯性问题(每个问题尽可能是对前一个问题的追问和深入探究), 问题的难度请标记为初级、中级、高级.
请给出问题即可, 不需要给出答案. 问题请以如下开头:
where is the // 去哪里找资料/代码
What is // 名词/原理解释
What is the difference // 热门的差异化对比问题
why // 为什么
how to // 怎么做、如何实现、如何解决
例如, Database Design
初级:
- Where is the SQL query optimizer? (查询优化器的位置)
- What is a query plan? (查询计划是什么?)
- What is the difference between WHERE and HAVING clauses? (WHERE 和 HAVING 子句的区别?)
- Why is indexing important for query performance? (索引为什么对查询性能很重要?)
中级:
- How to identify slow queries in a database system? (如何识别数据库系统中的慢查询?)
- What are some common techniques used for query tuning? (一些常见的查询调优技术有哪些?)
- What is the difference between different types of indexes (e.g., B-tree, hash)? (不同类型的索引(例如B树、哈希)的区别是什么?)
- Why might a query optimizer choose a less intuitive execution plan? (为什么查询优化器可能会选择一个不太直观的执行计划?)
高级:
- How can you analyze the execution statistics of a query to identify bottlenecks? (如何分析查询的执行统计信息来识别瓶颈?)
- What are some advanced techniques for optimizing complex queries involving joins and subqueries? (优化涉及连接和子查询的复杂查询的一些高级技术是什么?)
- How can you use database tracing tools to debug query performance issues? (如何使用数据库跟踪工具来调试查询性能问题?)
- What are the trade-offs between different query tuning strategies? (不同查询调优策略之间的权衡是什么?)
这些问题可以帮助你评估应聘者的对Query Tuning的理解程度,并确定他们是否具备解决实际问题的能力。
此外,你还应该根据具体的职位要求和公司文化调整问题难度和范围。
微调参考如下文章:
提问题的思路也参考了以上文章:
“Where do I",
"Is it okay to",
"Can you help me",
"I need to",
"Is there a",
"Do you know",
"Where is the",
"Can you tell me",
"Can I change",
"What are the",
"How do I",
"When is it",
"Does PostgreSQL have",
"How to",
"What is the difference",
"Can users",
"Can I",
"What is”.