Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a optimized database #271

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

viveksahu26
Copy link
Collaborator

@viveksahu26 viveksahu26 commented Jul 1, 2024

Closes issue: #265
This pull request introduces an optimized version of the db package to improve the performance of our database operations. As well as, benchmark tests been added to measure the performance and efficiency of the new implementation.

Changes modifies are:

  • Introduces new db structure for efficient retrieval:
    • records: Map to store records by their check_key.
    • ids: Map to store records by their id.
    • keyIds: Nested map to store records by both check_key and id.
    • allIds: Map to store unique IDs.

Storing records in map is highly optimized version at the retrieval time, which result into O(1) time complexity. Whereas, original db uses loops to retrieve records and that results into O(n) time complexity.

Use of Mutex concept:

Apart from that added a benchmark test, to measure the performance of key database operations::

  • Insert: Measures the time taken to insert a large number of records.
  • GetByKey: Measures the time taken to retrieve records by check_key.
  • GetByID: Measures the time taken to retrieve records by id.
  • GetByKeyAndID: Measures the time taken to retrieve records by both check_key and id.
  • GetAllIDs: Measures the time taken to retrieve all unique IDs.

Implementation of goroutines for parallel tasking:
- Goroutines are functions or methods that run concurrently with other functions or methods. They are created using the go keyword. Whereas channels are used to send and receive values, allowing goroutines to synchronize their operations. sync.RWMutex will make ensure that concurrent access to the database is handled correctly.

Signed-off-by: Vivek Kumar Sahu <[email protected]>
@viveksahu26 viveksahu26 force-pushed the issue_265_db_optimized branch 3 times, most recently from 57ec133 to 79f643b Compare July 2, 2024 06:19
@riteshnoronha
Copy link
Contributor

@viveksahu26 same here is this ready for review

@viveksahu26
Copy link
Collaborator Author

@viveksahu26 same here is this ready for review

Yeah, this one too.

@riteshnoronha
Copy link
Contributor

@viveksahu26 this appears to be too complex. Lets discuss this

Signed-off-by: Vivek Kumar Sahu <[email protected]>
@viveksahu26
Copy link
Collaborator Author

Yeah sure @riteshnoronha !! So, basically originally we were fetching records from database in 3 ways:

  • For Key (Ex: SBOM_SPEC, SBOM_SPEC_VERSION, SBOM_TIMESTAMP, etc)
  • For ID (Ex: SPDX Elements, SBOM Format SBOM Build Information, etc)
  • For Key and ID

For example, to get all records for particular Key, we had to loop over all records and check each record whether it contain that Key or not, If contains then append it to final list of records containing that Key and return that list as the loop ends. Similarly we perform operation to get records for Key as well similalrly for Key and ID. So, if we conclude it, we can see that at the end we return all the values(i.e. records) for a particular Key.

In the new changes, we have simplified it to Map data structure. As it has the functionality to store all values for a particular key.
So, earlier when we were adding any record we were simply appending to the list of records.

func (d *db) addRecord(r *record) {
	d.records = append(d.records, r)
}

But, when we are adding any record, we are adding it in 3 ways:

  • Map key as a Key and Map value as a Record.
  • Map key as a Id and Map value as a Record.
  • Map key as a Key & Id and Map value as a Record.
// addRecord adds a single record to the database
func (d *db) addRecord(r *record) {
	d.mu.Lock()
	defer d.mu.Unlock()

	// store record using a key
	d.keyRecords[r.check_key] = append(d.keyRecords[r.check_key], r)

	// store record using a id
	d.idRecords[r.id] = append(d.idRecords[r.id], r)
	if d.idRecords[r.id] == nil {
		d.keyIdRecords[r.id] = make(map[int][]*record)
	}

	// store record using a key and id
	d.keyIdRecords[r.id][r.check_key] = append(d.keyIdRecords[r.id][r.check_key], r)

	d.allIds[r.id] = struct{}{}
}

We are adding in this way, so that at the time of fetching these records from any any type of key, whether it be Key, or ID or Key and ID, we don't have to loop over records and check each records whether it contains that key or not.

Now coming to the concept of Mutex that we are using every time as below:
Mutex: ensures that only one thread or one process or one goroutine can access the critical section of code or shared resource at any given time.

NOTE: right now it struck me at the time of writing, we can remove mutex because we are not using goroutines.

Here we are using ReadWrite Mutex type.

mu           sync.RWMutex

Each time when write operation is being done to db, then we want to make sure that no other process or threads or goroutine is trying to write. That's why we are locking the operation. And once written it is unlocked. And then next threads or goroutine waiting in a queue will perform the writing operation.

d.mu.Lock()
defer d.mu.Unlock()

and

Below Mutex is used in case when multiple process or threads or goroutine is trying to read the database, in that

d.mu.RLock()
defer d.mu.RUnlock()

Let me know if you have any question ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants