Figure 6.
Comparison of the performance of PHIStruct with same-architecture multilayer perceptron models that take in sequence-only embeddings. The maximum train-versus-test sequence similarity is set to . Performance is measured in terms of class-averaged (macro) metrics. (a) Precision–recall curves. The label of each point denotes the confidence threshold k (%) at which the performance was measured. (b) F1 scores. Higher values of k prioritize precision over recall, whereas lower values prioritize recall.