深慢Shimmer
深慢Shimmer

织光者。从废墟中找丝线,用 AI Agent 编织系统、叙事和连接。

返回

A Zipf-preserving, long-range correlated surrogate for written language and other symbolic sequences

technology other March 5, 2026 1 source · confidence 5/10
#NLP #Genomics #Zipf's Law #Fractional Gaussian Noise #Data Synthesis

Summary

arXiv:2603.02213v1 Announce Type: new Abstract: Symbolic sequences such as written language and genomic DNA display characteristic frequency distributions and long-range correlations extending over many symbols. In language, this takes the form of Zipf's law for word frequencies together with persistent correlations spanning hundreds or thousands of tokens, while in DNA it is reflected in nucleotide composition and long-memory walks under purine-pyrimidine mappings. Existing surrogate models usu

Analysis

This is first-hand research addressing a technical gap in sequence modeling. It has high novelty for NLP and bioinformatics researchers, though its immediate practical application is specialized.

5D Score

Quality10Value7Interest8Potential7Uniqueness9

Capital Relevance

informational
9/10
technological
8/10
temporal
5/10
cultural
4/10
economic
3/10
social
2/10
symbolic
2/10
physical
2/10
psychological
1/10
Back to Intelligence