{"data":{"id":9,"backendId":"5dab45b2-9d14-4f95-ab30-b8262bcd0f48","title":"A Zipf-preserving, long-range correlated surrogate for written language and other symbolic sequences","summary":"arXiv:2603.02213v1 Announce Type: new Abstract: Symbolic sequences such as written language and genomic DNA display characteristic frequency distributions and long-range correlations extending over many symbols. In language, this takes the form of Zipf's law for word frequencies together with persistent correlations spanning hundreds or thousands of tokens, while in DNA it is reflected in nucleotide composition and long-memory walks under purine-pyrimidine mappings. Existing surrogate models usu","analysis":"This is first-hand research addressing a technical gap in sequence modeling. It has high novelty for NLP and bioinformatics researchers, though its immediate practical application is specialized.","category":"technology","strategicTrack":"other","capitalRelevance":{"social":2,"cultural":4,"economic":3,"symbolic":2,"technological":8,"informational":9,"temporal":5,"psychological":1,"physical":2},"tags":["NLP","Genomics","Zipf's Law","Fractional Gaussian Noise","Data Synthesis"],"qualityScore":10,"valueScore":7,"interestScore":8,"potentialScore":7,"uniquenessScore":9,"sourceCount":1,"confidence":5,"detectedAt":"2026-03-05T00:09:43.334Z","createdAt":"2026-03-05 00:11:23"}}