Specifications for analyses using tree-temporal scan statistics applied prospectively to genomic surveillance data among NYC residents
Feature . | SARS-CoV-2 . | Salmonella . | Notes . |
---|---|---|---|
Genomic data resolution | Pango lineages | Allele codes | The “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes |
Temporal element | Specimen collection date | Specimen collection date (or upload date, in sensitivity analyses) | The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC) |
Time precision | Day | We used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates | |
Study period | 12-week period ending on the most recent specimen collection date | 1-year period ending on the most recent specimen collection (or upload) date | For SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35] |
Only allow data on leaves of tree | No | For genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes | |
Allow multiple parents for the same node | Yes | No | We assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent |
Type of scan | Tree and time | We scanned for increases in cases at any node or group of related nodes and over any recent time period | |
Conditional analysis | Node and time | We conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period | |
Scan for branches with: | High rates | We wished to detect clusters as they emerged rather than declined | |
Maximum temporal size | 28 days | 90 days | For SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags |
Minimum temporal size | 14 days | 1 day | |
Prospective evaluation | Yes | Prospective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date | |
Perform node by day-of-week adjustment | No | Sequencing results were unlikely to vary by the day of the week on which the specimen was collected | |
Inference method | Sequential Monte Carlo | We used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters | |
Monte Carlo replications | 999 999 | 99 999 | To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals |
Prospective analysis frequency | Weekly | We performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed | |
Minimum number of cases | 2 | We retained the default minimum so as not to miss any emerging clusters | |
Signal definition | Recurrence interval (RI) ≥ 365 days | RI ≥ 100 days | We considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35] |
Feature . | SARS-CoV-2 . | Salmonella . | Notes . |
---|---|---|---|
Genomic data resolution | Pango lineages | Allele codes | The “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes |
Temporal element | Specimen collection date | Specimen collection date (or upload date, in sensitivity analyses) | The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC) |
Time precision | Day | We used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates | |
Study period | 12-week period ending on the most recent specimen collection date | 1-year period ending on the most recent specimen collection (or upload) date | For SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35] |
Only allow data on leaves of tree | No | For genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes | |
Allow multiple parents for the same node | Yes | No | We assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent |
Type of scan | Tree and time | We scanned for increases in cases at any node or group of related nodes and over any recent time period | |
Conditional analysis | Node and time | We conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period | |
Scan for branches with: | High rates | We wished to detect clusters as they emerged rather than declined | |
Maximum temporal size | 28 days | 90 days | For SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags |
Minimum temporal size | 14 days | 1 day | |
Prospective evaluation | Yes | Prospective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date | |
Perform node by day-of-week adjustment | No | Sequencing results were unlikely to vary by the day of the week on which the specimen was collected | |
Inference method | Sequential Monte Carlo | We used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters | |
Monte Carlo replications | 999 999 | 99 999 | To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals |
Prospective analysis frequency | Weekly | We performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed | |
Minimum number of cases | 2 | We retained the default minimum so as not to miss any emerging clusters | |
Signal definition | Recurrence interval (RI) ≥ 365 days | RI ≥ 100 days | We considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35] |
Specifications for analyses using tree-temporal scan statistics applied prospectively to genomic surveillance data among NYC residents
Feature . | SARS-CoV-2 . | Salmonella . | Notes . |
---|---|---|---|
Genomic data resolution | Pango lineages | Allele codes | The “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes |
Temporal element | Specimen collection date | Specimen collection date (or upload date, in sensitivity analyses) | The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC) |
Time precision | Day | We used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates | |
Study period | 12-week period ending on the most recent specimen collection date | 1-year period ending on the most recent specimen collection (or upload) date | For SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35] |
Only allow data on leaves of tree | No | For genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes | |
Allow multiple parents for the same node | Yes | No | We assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent |
Type of scan | Tree and time | We scanned for increases in cases at any node or group of related nodes and over any recent time period | |
Conditional analysis | Node and time | We conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period | |
Scan for branches with: | High rates | We wished to detect clusters as they emerged rather than declined | |
Maximum temporal size | 28 days | 90 days | For SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags |
Minimum temporal size | 14 days | 1 day | |
Prospective evaluation | Yes | Prospective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date | |
Perform node by day-of-week adjustment | No | Sequencing results were unlikely to vary by the day of the week on which the specimen was collected | |
Inference method | Sequential Monte Carlo | We used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters | |
Monte Carlo replications | 999 999 | 99 999 | To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals |
Prospective analysis frequency | Weekly | We performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed | |
Minimum number of cases | 2 | We retained the default minimum so as not to miss any emerging clusters | |
Signal definition | Recurrence interval (RI) ≥ 365 days | RI ≥ 100 days | We considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35] |
Feature . | SARS-CoV-2 . | Salmonella . | Notes . |
---|---|---|---|
Genomic data resolution | Pango lineages | Allele codes | The “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes |
Temporal element | Specimen collection date | Specimen collection date (or upload date, in sensitivity analyses) | The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC) |
Time precision | Day | We used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates | |
Study period | 12-week period ending on the most recent specimen collection date | 1-year period ending on the most recent specimen collection (or upload) date | For SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35] |
Only allow data on leaves of tree | No | For genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes | |
Allow multiple parents for the same node | Yes | No | We assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent |
Type of scan | Tree and time | We scanned for increases in cases at any node or group of related nodes and over any recent time period | |
Conditional analysis | Node and time | We conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period | |
Scan for branches with: | High rates | We wished to detect clusters as they emerged rather than declined | |
Maximum temporal size | 28 days | 90 days | For SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags |
Minimum temporal size | 14 days | 1 day | |
Prospective evaluation | Yes | Prospective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date | |
Perform node by day-of-week adjustment | No | Sequencing results were unlikely to vary by the day of the week on which the specimen was collected | |
Inference method | Sequential Monte Carlo | We used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters | |
Monte Carlo replications | 999 999 | 99 999 | To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals |
Prospective analysis frequency | Weekly | We performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed | |
Minimum number of cases | 2 | We retained the default minimum so as not to miss any emerging clusters | |
Signal definition | Recurrence interval (RI) ≥ 365 days | RI ≥ 100 days | We considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35] |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.