-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nichols phenotype browser: ECK accession ids that lack gene names #14
Comments
Hey Debbie where did you find these errors? |
Sandy, These are strains used by Nichols et al. that had only ECK identifiers, but not gene names. I looked up the gene names in EcoCyc based on the ECK_IDs. If you enter any of these ECK_IDs into the strain box for data browser, the data browser will return fitness scores for the various conditions tested, but the rows are identified by only the ECK_ID. For most other strains, if you enter either the gene name or ECK-ID into the strain box, for example, if you enter either arcA or ECK4393, the rows that are returned are identified by both the ECK_ID and the gene name. |
I came across some additional problems:
I am going to compare the 3,967 strains with the strain list from Nichols_TableS2 and double check all the gene names with what is in EcoCyc. |
|
There are 41 rows in the data browser that are missing the gene name that goes with the ECK_ID. I will get the gene names for these from EcoCyc. |
Oh so is this on EcoliWiki or OMP? |
I searched for one of the duplicated strains, ECK1323, and the condition novobiocin. There were two entries for each condition. This explains the difference in the number of entries when you search for a condition and the number of strains in the strain list. |
I didn't find the data browser on OMP until today. Last night, I saw that the link to the data browser that is on the main page of OMP goes to the data browser at EcoliWiki. When I did a search for the string "Nichols," I only found papers and the link on the main page . It occurred to me this afternoon that the string search might not have searched the Special Pages. So I went to the list of Special Pages and found the link to the Nichols data browser that is on OMP. On both versions of the Nichols data browser, the ECK_IDs I listed above need gene names added. The OMP Nichols data browser is missing boxes where you can link to the List of strains and the List of Conditions. On the OMP version of the Nichols data browser, if I select the radio button 'Growth data (Strain/condition)' and enter a specific condition, such as Novobiocin-12, I get a list of the fitness scores for all 3,979 strain rows from the Nichols paper, which indicates to me that the strain list includes the 12 strains that were done in duplicate. However, if I select the radio button 'Growth data (Strain/condition),' enter ECK1323, which is one of the strains done in duplicate, and a condition, such as Novobiocin, I get fitness scores for only one of the 2 duplicates. In contrast, if I do the same thing on the EcoliWiki data browser, I get both sets of fitness scores. I can see arguments for including both sets of data for each of these 12 strains or for only including one of the strains in the data browser. Whatever we decide, we should try to make the number of strains consistent. We could give the duplicate samples different names to keep them separate. Maybe ECK1323 (row 586) and ECK2859-ygeQ (row 3299)??? |
ECK2859 is ygeQ
ECK2647 is ypjC
ECK1128 is ymfH
ECK1933 is yedM
ECK4219 is yzfA
ECK4265 is yjgW
ECK1453 is yncM
ECK3675 is glvC
ECK2647 is ypjC
ECK0369 is yaiU
ECK2856 is ygeN
ECK2132 is yohH
ECK2675 is ygaY
ECK2859 is ygeQ
ECK0679 is ybfH
ECK1132 is ymfT
The text was updated successfully, but these errors were encountered: