Skip to content
jordanell edited this page Jul 3, 2012 · 19 revisions

This is the database schema that is created when the scm2pgsql project has been run. Below is a list of tables and information regarding the records that are stored in each along with their description.

##Commits This table stores the information for a commit such as commit id, author, email, comment, commit date and brand id. Please note git stores commits chronologically but this order is not necessary correct. The commit should be viewed in a parent and child relation ship as stored in Commit_family table. Columns are as follows.

id commit_id author author_email comments commit_date branch_id
integer not null auto_increment character varying(255) character varying(255) character varying(255) text timestamp with time zone (ex. "2012-05-07 09:33:49-07") character varying(255)
###Notes * **commit_date** example: __timestamp with time zone (ex. "2012-05-07 09:33:49-07")__

##File_diffs This table stores the differences of a file between 2 commits. First commit has an empty old_commit_id. Each entry represents a diff in a file which contains the raw text, the start character, end character and diff type.

file_id new_commit_id old_commit_id diff_text char_start char_end diff_type
character varying(255) character varying(255) character varying(255) text integer integer character varying(30)

###Notes

  • Char_start and Char_end of DIFF_MODIFYDELETE are relative to OLD FILE, and DIFF_MODIFYINSERT are relative to NEW FILE.
  • Char_end is offset by 1. For example, char_end = 10 means the last character of the diff is at index 9.
  • OLD FILE = EQUAL + DIFF_MODIFYDELETE
  • NEW FILE = EQUAL + DIFF_MODIFYINSERT
  • EQUAL can be calculated if we have either OLD FILE or NEW FILE. Since we have DIFF_MODIFYDELETE and MODIFYINSERT, one can figure out the NEW FILE from an OLD FILE and vice versa.

Diff type could be one of the following:

  • DIFF_ADD - file didn't exist in the old commit and was added in the new commit. A pair of commits should only have 1 Add entry.
  • DIFF_DELETE - file existed in the old commit and was deleted in the new commit. The file_id is the file name in the old commit. A pair of commits should only have 1 pair of Delete entry.
  • DIFF_MODIFYINSERT - file exists in the old commit and was inserted a block of codes in the new commit. A pair of commits can have multiple modifyinsert entries.
  • DIFF_MODIFYDELETE - file exists in the old commit and was deleted a block of codes in the new commit. A pair of commits can have multiple modifydelete entries.
  • DIFF_COPY - file was copied to different place. Never run into this case.
  • DIFF_RENAME - file was renamed to new file name. Currently JGit sees a rename file as a DIFF_DELETE and DIFF_ADD instead of DIFF_RENAME. This type is here just in case JGit decides to handle rename differently.
  • DIFF_UNKNOWN - error while reading the file diff, unknown type

##Branches This table stores the branch id and all of its commit id. A commit can appear in multiple branches. The first commit always has a NULL branch (just the way JGit handle it).

branch_id branch_name commit_id
character varying(255) character varying(100) character varying(255)

##Commit_family This table stores the whole commit graphs. Each entry show a parent and child relationship between two commits. A Parent can have multiple children and a child can have multiple Parents.

parent child
character varying(255). Not null character varying(255). Not null

##Owners This table stores the JGit Blame for a file in a given commit. The ownership gets updated only when a file is changed in a commit. The ownership does not store the ownership of a file for every single commit.

commit_id source_commit_id owner_id file_id char_start char_end change_type
character varying(255). Not null character varying(255). Not null character varying(255). Not null character varying(255). Not null integer. Not null integer. Not null character varying(12). Not null

##Networks This table stores the network generated from the callgraph analyzer.

new_commit_id old_commit_id network_id
character varying(255) character varying(255) integer. Not null, auto increment

###Notes

  • Indexes: "networks_pkey" PRIMARY KEY, btree (network_id) Referenced by: TABLE "edges" CONSTRAINT "edges_network_id_fkey" FOREIGN KEY (network_id) REFERENCES networks(network_id) ON DELETE CASCADE TABLE "nodes" CONSTRAINT "nodes_network_id_fkey" FOREIGN KEY (network_id) REFERENCES networks(network_id) ON DELETE CASCADE

##Nodes This table stores the node for Gephi graph visualizer.

id label network_id
character varying(255). Not null character varying(255) integer

###Notes

  • Foreign-key constraints: "nodes_network_id_fkey" FOREIGN KEY (network_id) REFERENCES networks(network_id) ON DELETE CASCADE

##Edges This table stores the Edges for Gephi graph visualizer.

source target weight is_fuzzy network_id
character varying(255) character varying(255) real boolean integer

###Notes

  • Foreign-key constraints: "edges_network_id_fkey" FOREIGN KEY (network_id) REFERENCES networks(network_id) ON DELETE CASCADE
Clone this wiki locally