Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nichols phenotype browser: ECK accession ids that lack gene names #14

Open
dsiegele opened this issue Apr 22, 2017 · 8 comments
Open

Nichols phenotype browser: ECK accession ids that lack gene names #14

dsiegele opened this issue Apr 22, 2017 · 8 comments

Comments

@dsiegele
Copy link
Member

dsiegele commented Apr 22, 2017

ECK2859 is ygeQ
ECK2647 is ypjC
ECK1128 is ymfH
ECK1933 is yedM
ECK4219 is yzfA
ECK4265 is yjgW
ECK1453 is yncM
ECK3675 is glvC
ECK2647 is ypjC
ECK0369 is yaiU
ECK2856 is ygeN
ECK2132 is yohH
ECK2675 is ygaY
ECK2859 is ygeQ
ECK0679 is ybfH
ECK1132 is ymfT

@sandyl27
Copy link

Hey Debbie where did you find these errors?

@dsiegele
Copy link
Member Author

dsiegele commented Jul 17, 2018

Sandy,

These are strains used by Nichols et al. that had only ECK identifiers, but not gene names. I looked up the gene names in EcoCyc based on the ECK_IDs.

If you enter any of these ECK_IDs into the strain box for data browser, the data browser will return fitness scores for the various conditions tested, but the rows are identified by only the ECK_ID.
For example, go to this page http://ecoliwiki.net/tools/chemgen/?qtype=s_growth&item1=ECK2859 and click submit.

For most other strains, if you enter either the gene name or ECK-ID into the strain box, for example, if you enter either arcA or ECK4393, the rows that are returned are identified by both the ECK_ID and the gene name.

@dsiegele
Copy link
Member Author

I came across some additional problems:

  1. I found more strains in the data browser that don't have gene names: ECK4426 and ECK3474. There are probably more, I will look through the list and see what I find.

  2. I found a strain that has the wrong gene name. ECK2858 should be named ygeP, but is named ECK2858-ygeQ' in the databrowser. ECK2859, which is one of the strains that doesn't have a gene name, should be named ygeQ. This mistake isn't in the list of strains in the Nichols paper (TableS2-column 1).

  3. The strain list and the data browser have different numbers of strains. If you enter a condition, such as novobiocin-12, you get back information for 3,979 entries. While if you click on the box 'List strains,' you get a list that contains only 3,967 entries.

I am going to compare the 3,967 strains with the strain list from Nichols_TableS2 and double check all the gene names with what is in EcoCyc.

@dsiegele
Copy link
Member Author

dsiegele commented Jul 17, 2018

  1. The difference in the number of strains is due to the 12 rows that appear to be duplicated in TableS2. The duplicated rows are:
    ECK0295-YKGO
    ECK1323-YMJC'
    ECK1544-GNSB
    ECK1556-HOKD
    ECK1824-MGRB
    ECK2613-SMPA
    ECK3357-YHFL
    ECK3531-DPPA
    ECK4410-YDGU
    ECK4415-YPFM
    ECK4416-RYFB
    ECK2593-A-YFIO* - Truncation

  2. How were the duplicates handled in the data browser? I searched for one of the duplicated strains, ECK1323, and the condition novobiocin. There were two entries for each condition. So the data browser has the data for each of the duplicates, whereas the strain list has only 1 listing of each strain.

@dsiegele
Copy link
Member Author

dsiegele commented Jul 17, 2018

There are 41 rows in the data browser that are missing the gene name that goes with the ECK_ID. I will get the gene names for these from EcoCyc.
ECK0012
ECK0017
ECK0266
ECK0320
ECK0359
ECK0367
ECK0369
ECK0503
ECK0619
ECK0679
ECK1128
ECK1132
ECK1159
ECK1160
ECK1453
ECK1933
ECK1990
ECK2132
ECK2331
ECK2636
ECK2637
ECK2647
ECK2650
ECK2651
ECK2652
ECK2675
ECK2854
ECK2856
ECK2859
ECK2994
ECK3474
ECK3672
ECK3675
ECK3769
ECK3802
ECK4097
ECK4219
ECK4265
ECK4330
ECK4334
ECK4426

@sandyl27
Copy link

Oh so is this on EcoliWiki or OMP?

@dsiegele
Copy link
Member Author

dsiegele commented Jul 17, 2018

I searched for one of the duplicated strains, ECK1323, and the condition novobiocin. There were two entries for each condition. This explains the difference in the number of entries when you search for a condition and the number of strains in the strain list.

@dsiegele
Copy link
Member Author

I didn't find the data browser on OMP until today. Last night, I saw that the link to the data browser that is on the main page of OMP goes to the data browser at EcoliWiki. When I did a search for the string "Nichols," I only found papers and the link on the main page . It occurred to me this afternoon that the string search might not have searched the Special Pages. So I went to the list of Special Pages and found the link to the Nichols data browser that is on OMP.

On both versions of the Nichols data browser, the ECK_IDs I listed above need gene names added.

The OMP Nichols data browser is missing boxes where you can link to the List of strains and the List of Conditions.

On the OMP version of the Nichols data browser, if I select the radio button 'Growth data (Strain/condition)' and enter a specific condition, such as Novobiocin-12, I get a list of the fitness scores for all 3,979 strain rows from the Nichols paper, which indicates to me that the strain list includes the 12 strains that were done in duplicate. However, if I select the radio button 'Growth data (Strain/condition),' enter ECK1323, which is one of the strains done in duplicate, and a condition, such as Novobiocin, I get fitness scores for only one of the 2 duplicates. In contrast, if I do the same thing on the EcoliWiki data browser, I get both sets of fitness scores.

I can see arguments for including both sets of data for each of these 12 strains or for only including one of the strains in the data browser. Whatever we decide, we should try to make the number of strains consistent. We could give the duplicate samples different names to keep them separate. Maybe ECK1323 (row 586) and ECK2859-ygeQ (row 3299)???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants