Table 3.

Performance of the agreement models compared to a random and majority baseline on the manually annotated test-set for the task of predicting concept presence in a patient note. Bolded text represents the best performing approach.

ApproachConcept accuracyConcept precisionConcept recallConcept F1
Random baseline0.310.450.310.35
Majority baseline0.650.420.650.51
Llama2-7B0.430.620.430.38
Llama2-13B0.470.680.480.5
GPT-4 (prompt does not contain concept definition)0.850.860.820.83
GPT-4 (prompt contains concept definition)0.860.870.860.84
ApproachConcept accuracyConcept precisionConcept recallConcept F1
Random baseline0.310.450.310.35
Majority baseline0.650.420.650.51
Llama2-7B0.430.620.430.38
Llama2-13B0.470.680.480.5
GPT-4 (prompt does not contain concept definition)0.850.860.820.83
GPT-4 (prompt contains concept definition)0.860.870.860.84
Table 3.

Performance of the agreement models compared to a random and majority baseline on the manually annotated test-set for the task of predicting concept presence in a patient note. Bolded text represents the best performing approach.

ApproachConcept accuracyConcept precisionConcept recallConcept F1
Random baseline0.310.450.310.35
Majority baseline0.650.420.650.51
Llama2-7B0.430.620.430.38
Llama2-13B0.470.680.480.5
GPT-4 (prompt does not contain concept definition)0.850.860.820.83
GPT-4 (prompt contains concept definition)0.860.870.860.84
ApproachConcept accuracyConcept precisionConcept recallConcept F1
Random baseline0.310.450.310.35
Majority baseline0.650.420.650.51
Llama2-7B0.430.620.430.38
Llama2-13B0.470.680.480.5
GPT-4 (prompt does not contain concept definition)0.850.860.820.83
GPT-4 (prompt contains concept definition)0.860.870.860.84
Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close