Performance of the agreement models compared to a random and majority baseline on the manually annotated test-set for the task of predicting concept presence in a patient note. Bolded text represents the best performing approach.
Approach . | Concept accuracy . | Concept precision . | Concept recall . | Concept F1 . |
---|---|---|---|---|
Random baseline | 0.31 | 0.45 | 0.31 | 0.35 |
Majority baseline | 0.65 | 0.42 | 0.65 | 0.51 |
Llama2-7B | 0.43 | 0.62 | 0.43 | 0.38 |
Llama2-13B | 0.47 | 0.68 | 0.48 | 0.5 |
GPT-4 (prompt does not contain concept definition) | 0.85 | 0.86 | 0.82 | 0.83 |
GPT-4 (prompt contains concept definition) | 0.86 | 0.87 | 0.86 | 0.84 |
Approach . | Concept accuracy . | Concept precision . | Concept recall . | Concept F1 . |
---|---|---|---|---|
Random baseline | 0.31 | 0.45 | 0.31 | 0.35 |
Majority baseline | 0.65 | 0.42 | 0.65 | 0.51 |
Llama2-7B | 0.43 | 0.62 | 0.43 | 0.38 |
Llama2-13B | 0.47 | 0.68 | 0.48 | 0.5 |
GPT-4 (prompt does not contain concept definition) | 0.85 | 0.86 | 0.82 | 0.83 |
GPT-4 (prompt contains concept definition) | 0.86 | 0.87 | 0.86 | 0.84 |
Performance of the agreement models compared to a random and majority baseline on the manually annotated test-set for the task of predicting concept presence in a patient note. Bolded text represents the best performing approach.
Approach . | Concept accuracy . | Concept precision . | Concept recall . | Concept F1 . |
---|---|---|---|---|
Random baseline | 0.31 | 0.45 | 0.31 | 0.35 |
Majority baseline | 0.65 | 0.42 | 0.65 | 0.51 |
Llama2-7B | 0.43 | 0.62 | 0.43 | 0.38 |
Llama2-13B | 0.47 | 0.68 | 0.48 | 0.5 |
GPT-4 (prompt does not contain concept definition) | 0.85 | 0.86 | 0.82 | 0.83 |
GPT-4 (prompt contains concept definition) | 0.86 | 0.87 | 0.86 | 0.84 |
Approach . | Concept accuracy . | Concept precision . | Concept recall . | Concept F1 . |
---|---|---|---|---|
Random baseline | 0.31 | 0.45 | 0.31 | 0.35 |
Majority baseline | 0.65 | 0.42 | 0.65 | 0.51 |
Llama2-7B | 0.43 | 0.62 | 0.43 | 0.38 |
Llama2-13B | 0.47 | 0.68 | 0.48 | 0.5 |
GPT-4 (prompt does not contain concept definition) | 0.85 | 0.86 | 0.82 | 0.83 |
GPT-4 (prompt contains concept definition) | 0.86 | 0.87 | 0.86 | 0.84 |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.