Table 1.

Specifications for analyses using tree-temporal scan statistics applied prospectively to genomic surveillance data among NYC residents

FeatureSARS-CoV-2SalmonellaNotes
Genomic data resolutionPango lineagesAllele codesThe “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes
Temporal elementSpecimen collection dateSpecimen collection date (or upload date, in sensitivity analyses)The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC)
Time precisionDayWe used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates
Study period12-week period ending on the most recent specimen collection date1-year period ending on the most recent specimen collection (or upload) dateFor SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35]
Only allow data on leaves of treeNoFor genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes
Allow multiple parents for the same nodeYesNoWe assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent
Type of scanTree and timeWe scanned for increases in cases at any node or group of related nodes and over any recent time period
Conditional analysisNode and timeWe conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period
Scan for branches with:High ratesWe wished to detect clusters as they emerged rather than declined
Maximum temporal size28 days90 daysFor SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags
Minimum temporal size14 days1 day
Prospective evaluationYesProspective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date
Perform node by day-of-week adjustmentNoSequencing results were unlikely to vary by the day of the week on which the specimen was collected
Inference methodSequential Monte CarloWe used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters
Monte Carlo replications999 99999 999To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals
Prospective analysis frequencyWeeklyWe performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed
Minimum number of cases2We retained the default minimum so as not to miss any emerging clusters
Signal definitionRecurrence interval (RI) ≥ 365 daysRI ≥ 100 daysWe considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35]
FeatureSARS-CoV-2SalmonellaNotes
Genomic data resolutionPango lineagesAllele codesThe “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes
Temporal elementSpecimen collection dateSpecimen collection date (or upload date, in sensitivity analyses)The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC)
Time precisionDayWe used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates
Study period12-week period ending on the most recent specimen collection date1-year period ending on the most recent specimen collection (or upload) dateFor SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35]
Only allow data on leaves of treeNoFor genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes
Allow multiple parents for the same nodeYesNoWe assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent
Type of scanTree and timeWe scanned for increases in cases at any node or group of related nodes and over any recent time period
Conditional analysisNode and timeWe conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period
Scan for branches with:High ratesWe wished to detect clusters as they emerged rather than declined
Maximum temporal size28 days90 daysFor SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags
Minimum temporal size14 days1 day
Prospective evaluationYesProspective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date
Perform node by day-of-week adjustmentNoSequencing results were unlikely to vary by the day of the week on which the specimen was collected
Inference methodSequential Monte CarloWe used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters
Monte Carlo replications999 99999 999To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals
Prospective analysis frequencyWeeklyWe performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed
Minimum number of cases2We retained the default minimum so as not to miss any emerging clusters
Signal definitionRecurrence interval (RI) ≥ 365 daysRI ≥ 100 daysWe considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35]
Table 1.

Specifications for analyses using tree-temporal scan statistics applied prospectively to genomic surveillance data among NYC residents

FeatureSARS-CoV-2SalmonellaNotes
Genomic data resolutionPango lineagesAllele codesThe “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes
Temporal elementSpecimen collection dateSpecimen collection date (or upload date, in sensitivity analyses)The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC)
Time precisionDayWe used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates
Study period12-week period ending on the most recent specimen collection date1-year period ending on the most recent specimen collection (or upload) dateFor SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35]
Only allow data on leaves of treeNoFor genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes
Allow multiple parents for the same nodeYesNoWe assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent
Type of scanTree and timeWe scanned for increases in cases at any node or group of related nodes and over any recent time period
Conditional analysisNode and timeWe conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period
Scan for branches with:High ratesWe wished to detect clusters as they emerged rather than declined
Maximum temporal size28 days90 daysFor SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags
Minimum temporal size14 days1 day
Prospective evaluationYesProspective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date
Perform node by day-of-week adjustmentNoSequencing results were unlikely to vary by the day of the week on which the specimen was collected
Inference methodSequential Monte CarloWe used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters
Monte Carlo replications999 99999 999To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals
Prospective analysis frequencyWeeklyWe performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed
Minimum number of cases2We retained the default minimum so as not to miss any emerging clusters
Signal definitionRecurrence interval (RI) ≥ 365 daysRI ≥ 100 daysWe considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35]
FeatureSARS-CoV-2SalmonellaNotes
Genomic data resolutionPango lineagesAllele codesThe “nodes” in our genomic surveillance trees represent SARS-CoV-2 variants or Salmonella allele codes
Temporal elementSpecimen collection dateSpecimen collection date (or upload date, in sensitivity analyses)The specimen collection date is the most epidemiologically relevant date, representing when patients sought care. For Salmonella, to accommodate delays between specimen collection and allele code assignment, we also conducted sensitivity analyses, in which the temporal element was the date uploaded to the System for Enteric Disease Response, Investigation, and Coordination (SEDRIC)
Time precisionDayWe used data at daily resolution (as opposed to aggregating by week or month) to improve precision in cluster start dates
Study period12-week period ending on the most recent specimen collection date1-year period ending on the most recent specimen collection (or upload) dateFor SARS-CoV-2, due to rapid variant turnover, we used a short study period that was three times as long as the maximum temporal cluster size (see below). For Salmonella, we used the standard study period of 1 year [35]
Only allow data on leaves of treeNoFor genomic surveillance data, valid patient results could be anywhere on the tree, not only at the most specific nodes
Allow multiple parents for the same nodeYesNoWe assigned multiple parents for recombinant SARS-CoV-2 lineages, effective in January 2024. For Salmonella, each node had only one parent
Type of scanTree and timeWe scanned for increases in cases at any node or group of related nodes and over any recent time period
Conditional analysisNode and timeWe conditioned on time to adjust nonparametrically for any citywide purely temporal patterns, such as data-reporting lags or increasing or decreasing trends. We also conditioned on node to account for whether cases historically had been common or rare at each node during the baseline period. This is because we were interested in detecting newly emerging nodes, not nodes that were also common during the baseline period
Scan for branches with:High ratesWe wished to detect clusters as they emerged rather than declined
Maximum temporal size28 days90 daysFor SARS-CoV-2, we searched for increases in variants during the most recent 14, 15, 16, …, 27, or 28 days to balance recency and persistence. For Salmonella, we searched for allele codes with increases during the most recent 1, 2, 3, …, 89, or 90 days to encompass the standard 60 days in the rule-based Salmonella definition, plus an additional 30 days to accommodate data lags
Minimum temporal size14 days1 day
Prospective evaluationYesProspective analyses were used to search for emerging clusters rather than historical clusters by only considering temporal windows reaching up to the study period end date
Perform node by day-of-week adjustmentNoSequencing results were unlikely to vary by the day of the week on which the specimen was collected
Inference methodSequential Monte CarloWe used a sequential method with an early termination cutoff, which allowed runs to terminate early if there were no unusual clusters
Monte Carlo replications999 99999 999To slightly improve performance, we used more than the standard 999 Monte Carlo replications, as allowed based on computing time, which is determined by the number of tree nodes and time intervals
Prospective analysis frequencyWeeklyWe performed analyses weekly (as opposed to daily) to match the frequency with which input data were refreshed
Minimum number of cases2We retained the default minimum so as not to miss any emerging clusters
Signal definitionRecurrence interval (RI) ≥ 365 daysRI ≥ 100 daysWe considered RI 100 to <365 days as a weak cluster, RI 365 days to <5 years as a moderate cluster, RI 5 to <100 years as a strong cluster, and RI ≥100 years as a very strong cluster [35]
Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close