Figure 2.
Overview of the creation of our 12 datasets (Section 2.5). We create two sets of proteins, i.e. allAF (denoted by the letter code a) and reliableAF (denoted by rl). Among each of allAF and reliableAF protein sets, we create a sequence redundant (letter code r) and a sequence non-redundant (letter code nr) set of proteins. From each of the sequence redundant and sequence non-redundant protein sets, we create three datasets reflecting the ratio of the numbers of non-TFs vs. TFs of 3, 5, and 10. We name a dataset using the convention D(x,z,s), where x∈{rl,a}, z∈{nr,r}, and s∈{3,5,10}.  In the figure, the shaded boxes highlight the relevant parts of the data creation logic leading to sequence non-redundant datasets.

Overview of the creation of our 12 datasets (Section 2.5). We create two sets of proteins, i.e. allAF (denoted by the letter code a) and reliableAF (denoted by rl). Among each of allAF and reliableAF protein sets, we create a sequence redundant (letter code r) and a sequence non-redundant (letter code nr) set of proteins. From each of the sequence redundant and sequence non-redundant protein sets, we create three datasets reflecting the ratio of the numbers of non-TFs vs. TFs of 3, 5, and 10. We name a dataset using the convention D(x,z,s), where x{rl,a},z{nr,r}, and s{3,5,10}.  In the figure, the shaded boxes highlight the relevant parts of the data creation logic leading to sequence non-redundant datasets.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close