Number of distinct substrings in a string

Tries Problems Hard
  • Fun Fact: The concept of 'distinct substrings' is a fundamental in many applications across the software industry
  • For instance, in the fields of bioinformatics and DNA sequencing, this concept is utilized to analyze and identify unique patterns or sequences in the DNA
  • Moreover, in data mining and machine learning, finding distinct substrings often plays a pivotal role in text analysis, string matching, extraction of unique features and plagiarism detection systems
  • The efficiency and accuracy of such crucial applications are largely dependent on the implementation of algorithms that swiftly and accurately solve such problems

Given a string s, determine the number of distinct substrings (including the empty substring) of the given string.


A string B is a substring of a string A if B can be obtained by deleting several characters (possibly none) from the start of A and several characters (possibly none) from the end of A. Two strings X and Y are considered different if there is at least one index i such that the character of X at index i is different from the character of Y at index i (X[i] != Y[i]).

Examples:

Input : s = "aba"

Output : 6

Explanation : The distinct substrings are "a", "ab", "ba", "b", "aba", "".

Input : s = "abc"

Output : 7

Explanation : The distinct substrings are "a", "ab", "abc", "b", "bc", "c", "".

Input : s = "aaabc"

Constraints

  • 1 <= s.length <= 103
  • s consist of only lowercase English letters.

Hints

  • A Trie-based approach can be used by inserting all suffixes of s into a Trie and counting the total number of distinct substrings contributed by each suffix. However, this approach may be memory-intensive for long strings.
  • "A more optimized approach is the Suffix Array + LCP (Longest Common Prefix) method: Construct a Suffix Array to sort all suffixes of s in lexicographical order. Compute the LCP array, which stores the longest common prefix between adjacent suffixes. The total number of distinct substrings is given by the sum of lengths of suffixes minus the LCP contributions."

Company Tags

Databricks MongoDB OYO Rooms IBM Intel DoorDash Cloudflare McKinsey & Company Splunk Seagate Technology Wayfair Bloomberg JPMorgan Chase Texas Instruments Instacart Bain & Company Morgan Stanley Ubisoft Western Digital Unity Technologies Byju's HCL Technologies Boston Consulting Group Johnson & Johnson HashiCorp Google Microsoft Amazon Meta Apple Netflix Adobe