Version of C# StringBuilder to allow for strings larger than 2 billion characters

In C#, 64bit Windows, .NET 4.5 (or later), and enabling gcAllowVeryLargeObjects in the App.config file allows for objects larger than two gigabyte. That's cool, but unfortunately, the maximum number of elements that C# allows in an array is still limited to about 2^31 = 2.15 billion. Testing confirmed this.

To overcome this, Microsoft recommends in Option B creating the arrays natively. Problem is we need to use unsafe code, and as far as I know, unicode won't be supported, at least not easily.

So I ended up creating my own BigStringBuilder function in the end. It's a list where each list element (or page) is a char array (type List<char>).

Providing you're using 64 bit Windows, you can now easily surpass the 2 billion character element limit. I managed to test creating a giant string around 32 gigabytes large (needed to increase virtual memory in the OS first, otherwise I could only get around 7GB on my 8GB RAM PC). I'm sure it handles more than 32GB easily. In theory, it should be able to handle around 1,000,000,000 * 1,000,000,000 chars or one quintillion characters, which should be enough for anyone!

I also added some common functions to provide some functionality such as fileSave(), length(), substring(), replace(), etc. Like the StringBuilder, in-place character writing (mutability), and instant truncation are possible.

Speed-wise, some quick tests show that it's not significantly slower than a StringBuilder when appending (found it was 33% slower in one test). I got similar performance if I went for a 2D jagged char array (char) instead of List<char>, but Lists are simpler to work with, so I stuck with that.

I'm looking for advice to potentially speed up performance, particularly for the append function, and to access or write faster via the indexer (public char this[long n] {...} )

// A simplified version specially for StackOverflow / Codereview

public class BigStringBuilder

{

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;



    public BigStringBuilder(int pagedepth = 12) {   // pagesize is 2^pagedepth (since must be a power of 2 for a fast indexer)

        this.pagedepth = pagedepth;

        pagesize = (long)Math.Pow(2, pagedepth);

        mpagesize = pagesize - 1;

        c.Add(new char[pagesize]);

    }



    // Indexer for this class, so you can use convenient square bracket indexing to address char elements within the array!!

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }



    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

        currentPage = 0;

        currentPosInPage = 0;

    }



    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }



    public void fileOpen(string path)   {

        clear();

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

            }

        }

        sw.Close();

    }



    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }



    public string ToString(long max = 2000000000)   {

        if (length() < max) return substring(0, length());

        else return substring(0, max);

    }



    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

        return sb.ToString();

    }



    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }



    // Simple implementation of an append() function. Testing shows this to be about

    // as fast or faster than the more sophisticated Append2() function further below

    // despite its simplicity:

    public void Append(string s)

    {

        for (int i = 0; i < s.Length; i++)

        {

            c[currentPage][currentPosInPage] = s[i];

            currentPosInPage++;

            if (currentPosInPage == pagesize)

            {

                currentPosInPage = 0;

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

        }

    }



    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster. 

    public void Append2(string s)

    {

        if (currentPosInPage + s.Length <= pagesize)

        {

            // append s entirely to current page

            for (int i = 0; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

        else

        {

            int stringpos;

            int topup = (int)pagesize - currentPosInPage;

            // Finish off current page with substring of s

            for (int i = 0; i < topup; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

            currentPage++;

            currentPosInPage = 0;

            stringpos = topup;

            int remainingPagesToFill = (s.Length - topup) >> pagedepth; // We want the floor here

            // fill up complete pages if necessary:

            if (remainingPagesToFill > 0)

            {

                for (int i = 0; i < remainingPagesToFill; i++)

                {

                    if (currentPage == c.Count) c.Add(new char[pagesize]);

                    for (int j = 0; j < pagesize; j++)

                    {

                        c[currentPage][j] = s[stringpos];

                        stringpos++;

                    }

                    currentPage++;

                }

            }

            // finish off remainder of string s on new page:

            if (currentPage == c.Count) c.Add(new char[pagesize]);

            for (int i = stringpos; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

    }

}

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

asked 13 hours ago

Dan W

1263

New contributor

4

$begingroup$
So what kind of crazy stuff one needs this large data-types for?
$endgroup$
– t3chb0t
11 hours ago

3

$begingroup$
ok... but isn't streaming it easier and faster than loading the entire file into memory? It screams: the XY Problem. Your users are not responsible for you wasting RAM :-P
$endgroup$
– t3chb0t
11 hours ago

1

$begingroup$
The question you should be asking is how you can convert this giant CSV more efficiently rather than brute-forcing it into your RAM.
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
oh boy... this sounds like you're pushing json over csv... this is even more scarry then I thought. This entire concept seems to be pretty odd :-| Why don't you do the filtering on the fly? Read, filter, write...? Anyway, have fun with this monster solution ;-]
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
@DanW: it still sounds like treating the input as one giant string is not the most efficient approach. If you really can't process it in a streaming fashion, then did you look into specialized data structures such as ropes, gap buffers, piece tables, that sort of stuff?
$endgroup$
– Pieter Witvoet
10 hours ago

|
show 9 more comments

To overcome this, Microsoft recommends in Option B creating the arrays natively. Problem is we need to use unsafe code, and as far as I know, unicode won't be supported, at least not easily.

So I ended up creating my own BigStringBuilder function in the end. It's a list where each list element (or page) is a char array (type List<char>).

I'm looking for advice to potentially speed up performance, particularly for the append function, and to access or write faster via the indexer (public char this[long n] {...} )

// A simplified version specially for StackOverflow / Codereview

public class BigStringBuilder

{

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;



    public BigStringBuilder(int pagedepth = 12) {   // pagesize is 2^pagedepth (since must be a power of 2 for a fast indexer)

        this.pagedepth = pagedepth;

        pagesize = (long)Math.Pow(2, pagedepth);

        mpagesize = pagesize - 1;

        c.Add(new char[pagesize]);

    }



    // Indexer for this class, so you can use convenient square bracket indexing to address char elements within the array!!

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }



    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

        currentPage = 0;

        currentPosInPage = 0;

    }



    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }



    public void fileOpen(string path)   {

        clear();

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

            }

        }

        sw.Close();

    }



    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }



    public string ToString(long max = 2000000000)   {

        if (length() < max) return substring(0, length());

        else return substring(0, max);

    }



    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

        return sb.ToString();

    }



    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }



    // Simple implementation of an append() function. Testing shows this to be about

    // as fast or faster than the more sophisticated Append2() function further below

    // despite its simplicity:

    public void Append(string s)

    {

        for (int i = 0; i < s.Length; i++)

        {

            c[currentPage][currentPosInPage] = s[i];

            currentPosInPage++;

            if (currentPosInPage == pagesize)

            {

                currentPosInPage = 0;

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

        }

    }



    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster. 

    public void Append2(string s)

    {

        if (currentPosInPage + s.Length <= pagesize)

        {

            // append s entirely to current page

            for (int i = 0; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

        else

        {

            int stringpos;

            int topup = (int)pagesize - currentPosInPage;

            // Finish off current page with substring of s

            for (int i = 0; i < topup; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

            currentPage++;

            currentPosInPage = 0;

            stringpos = topup;

            int remainingPagesToFill = (s.Length - topup) >> pagedepth; // We want the floor here

            // fill up complete pages if necessary:

            if (remainingPagesToFill > 0)

            {

                for (int i = 0; i < remainingPagesToFill; i++)

                {

                    if (currentPage == c.Count) c.Add(new char[pagesize]);

                    for (int j = 0; j < pagesize; j++)

                    {

                        c[currentPage][j] = s[stringpos];

                        stringpos++;

                    }

                    currentPage++;

                }

            }

            // finish off remainder of string s on new page:

            if (currentPage == c.Count) c.Add(new char[pagesize]);

            for (int i = stringpos; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

    }

}

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

asked 13 hours ago

Dan W

1263

New contributor

4

$begingroup$
So what kind of crazy stuff one needs this large data-types for?
$endgroup$
– t3chb0t
11 hours ago

3

$begingroup$
ok... but isn't streaming it easier and faster than loading the entire file into memory? It screams: the XY Problem. Your users are not responsible for you wasting RAM :-P
$endgroup$
– t3chb0t
11 hours ago

1

$begingroup$
The question you should be asking is how you can convert this giant CSV more efficiently rather than brute-forcing it into your RAM.
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
oh boy... this sounds like you're pushing json over csv... this is even more scarry then I thought. This entire concept seems to be pretty odd :-| Why don't you do the filtering on the fly? Read, filter, write...? Anyway, have fun with this monster solution ;-]
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
@DanW: it still sounds like treating the input as one giant string is not the most efficient approach. If you really can't process it in a streaming fashion, then did you look into specialized data structures such as ropes, gap buffers, piece tables, that sort of stuff?
$endgroup$
– Pieter Witvoet
10 hours ago

|
show 9 more comments

To overcome this, Microsoft recommends in Option B creating the arrays natively. Problem is we need to use unsafe code, and as far as I know, unicode won't be supported, at least not easily.

So I ended up creating my own BigStringBuilder function in the end. It's a list where each list element (or page) is a char array (type List<char>).

I'm looking for advice to potentially speed up performance, particularly for the append function, and to access or write faster via the indexer (public char this[long n] {...} )

// A simplified version specially for StackOverflow / Codereview

public class BigStringBuilder

{

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;



    public BigStringBuilder(int pagedepth = 12) {   // pagesize is 2^pagedepth (since must be a power of 2 for a fast indexer)

        this.pagedepth = pagedepth;

        pagesize = (long)Math.Pow(2, pagedepth);

        mpagesize = pagesize - 1;

        c.Add(new char[pagesize]);

    }



    // Indexer for this class, so you can use convenient square bracket indexing to address char elements within the array!!

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }



    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

        currentPage = 0;

        currentPosInPage = 0;

    }



    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }



    public void fileOpen(string path)   {

        clear();

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

            }

        }

        sw.Close();

    }



    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }



    public string ToString(long max = 2000000000)   {

        if (length() < max) return substring(0, length());

        else return substring(0, max);

    }



    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

        return sb.ToString();

    }



    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }



    // Simple implementation of an append() function. Testing shows this to be about

    // as fast or faster than the more sophisticated Append2() function further below

    // despite its simplicity:

    public void Append(string s)

    {

        for (int i = 0; i < s.Length; i++)

        {

            c[currentPage][currentPosInPage] = s[i];

            currentPosInPage++;

            if (currentPosInPage == pagesize)

            {

                currentPosInPage = 0;

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

        }

    }



    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster. 

    public void Append2(string s)

    {

        if (currentPosInPage + s.Length <= pagesize)

        {

            // append s entirely to current page

            for (int i = 0; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

        else

        {

            int stringpos;

            int topup = (int)pagesize - currentPosInPage;

            // Finish off current page with substring of s

            for (int i = 0; i < topup; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

            currentPage++;

            currentPosInPage = 0;

            stringpos = topup;

            int remainingPagesToFill = (s.Length - topup) >> pagedepth; // We want the floor here

            // fill up complete pages if necessary:

            if (remainingPagesToFill > 0)

            {

                for (int i = 0; i < remainingPagesToFill; i++)

                {

                    if (currentPage == c.Count) c.Add(new char[pagesize]);

                    for (int j = 0; j < pagesize; j++)

                    {

                        c[currentPage][j] = s[stringpos];

                        stringpos++;

                    }

                    currentPage++;

                }

            }

            // finish off remainder of string s on new page:

            if (currentPage == c.Count) c.Add(new char[pagesize]);

            for (int i = stringpos; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

    }

}

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

asked 13 hours ago

Dan W

1263

New contributor

To overcome this, Microsoft recommends in Option B creating the arrays natively. Problem is we need to use unsafe code, and as far as I know, unicode won't be supported, at least not easily.

So I ended up creating my own BigStringBuilder function in the end. It's a list where each list element (or page) is a char array (type List<char>).

I'm looking for advice to potentially speed up performance, particularly for the append function, and to access or write faster via the indexer (public char this[long n] {...} )

// A simplified version specially for StackOverflow / Codereview

public class BigStringBuilder

{

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;



    public BigStringBuilder(int pagedepth = 12) {   // pagesize is 2^pagedepth (since must be a power of 2 for a fast indexer)

        this.pagedepth = pagedepth;

        pagesize = (long)Math.Pow(2, pagedepth);

        mpagesize = pagesize - 1;

        c.Add(new char[pagesize]);

    }



    // Indexer for this class, so you can use convenient square bracket indexing to address char elements within the array!!

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }



    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

        currentPage = 0;

        currentPosInPage = 0;

    }



    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }



    public void fileOpen(string path)   {

        clear();

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

            }

        }

        sw.Close();

    }



    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }



    public string ToString(long max = 2000000000)   {

        if (length() < max) return substring(0, length());

        else return substring(0, max);

    }



    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

        return sb.ToString();

    }



    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }



    // Simple implementation of an append() function. Testing shows this to be about

    // as fast or faster than the more sophisticated Append2() function further below

    // despite its simplicity:

    public void Append(string s)

    {

        for (int i = 0; i < s.Length; i++)

        {

            c[currentPage][currentPosInPage] = s[i];

            currentPosInPage++;

            if (currentPosInPage == pagesize)

            {

                currentPosInPage = 0;

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

        }

    }



    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster. 

    public void Append2(string s)

    {

        if (currentPosInPage + s.Length <= pagesize)

        {

            // append s entirely to current page

            for (int i = 0; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

        else

        {

            int stringpos;

            int topup = (int)pagesize - currentPosInPage;

            // Finish off current page with substring of s

            for (int i = 0; i < topup; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

            currentPage++;

            currentPosInPage = 0;

            stringpos = topup;

            int remainingPagesToFill = (s.Length - topup) >> pagedepth; // We want the floor here

            // fill up complete pages if necessary:

            if (remainingPagesToFill > 0)

            {

                for (int i = 0; i < remainingPagesToFill; i++)

                {

                    if (currentPage == c.Count) c.Add(new char[pagesize]);

                    for (int j = 0; j < pagesize; j++)

                    {

                        c[currentPage][j] = s[stringpos];

                        stringpos++;

                    }

                    currentPage++;

                }

            }

            // finish off remainder of string s on new page:

            if (currentPage == c.Count) c.Add(new char[pagesize]);

            for (int i = stringpos; i < s.Length; i++)

            {

                c[currentPage][currentPosInPage] = s[i];

                currentPosInPage++;

            }

        }

    }

}

c# performance strings pagination

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

asked 13 hours ago

Dan W

1263

New contributor

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

asked 13 hours ago

Dan W

1263

New contributor

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

edited 6 hours ago

Simon Forsberg♦

48.7k7130286

asked 13 hours ago

Dan W

1263

New contributor

asked 13 hours ago

Dan W

1263

asked 13 hours ago

Dan W

1263

New contributor

Dan W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

4

$begingroup$
So what kind of crazy stuff one needs this large data-types for?
$endgroup$
– t3chb0t
11 hours ago

3

$begingroup$
ok... but isn't streaming it easier and faster than loading the entire file into memory? It screams: the XY Problem. Your users are not responsible for you wasting RAM :-P
$endgroup$
– t3chb0t
11 hours ago

1

$begingroup$
The question you should be asking is how you can convert this giant CSV more efficiently rather than brute-forcing it into your RAM.
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
oh boy... this sounds like you're pushing json over csv... this is even more scarry then I thought. This entire concept seems to be pretty odd :-| Why don't you do the filtering on the fly? Read, filter, write...? Anyway, have fun with this monster solution ;-]
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
@DanW: it still sounds like treating the input as one giant string is not the most efficient approach. If you really can't process it in a streaming fashion, then did you look into specialized data structures such as ropes, gap buffers, piece tables, that sort of stuff?
$endgroup$
– Pieter Witvoet
10 hours ago

|
show 9 more comments

4

$begingroup$
So what kind of crazy stuff one needs this large data-types for?
$endgroup$
– t3chb0t
11 hours ago

3

$begingroup$
ok... but isn't streaming it easier and faster than loading the entire file into memory? It screams: the XY Problem. Your users are not responsible for you wasting RAM :-P
$endgroup$
– t3chb0t
11 hours ago

1

$begingroup$
The question you should be asking is how you can convert this giant CSV more efficiently rather than brute-forcing it into your RAM.
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
oh boy... this sounds like you're pushing json over csv... this is even more scarry then I thought. This entire concept seems to be pretty odd :-| Why don't you do the filtering on the fly? Read, filter, write...? Anyway, have fun with this monster solution ;-]
$endgroup$
– t3chb0t
10 hours ago

1

$begingroup$
@DanW: it still sounds like treating the input as one giant string is not the most efficient approach. If you really can't process it in a streaming fashion, then did you look into specialized data structures such as ropes, gap buffers, piece tables, that sort of stuff?
$endgroup$
– Pieter Witvoet
10 hours ago

So what kind of crazy stuff one needs this large data-types for?

– t3chb0t
11 hours ago

ok... but isn't streaming it easier and faster than loading the entire file into memory? It screams: the XY Problem. Your users are not responsible for you wasting RAM :-P

– t3chb0t
11 hours ago

The question you should be asking is how you can convert this giant CSV more efficiently rather than brute-forcing it into your RAM.

– t3chb0t
10 hours ago

oh boy... this sounds like you're pushing json over csv... this is even more scarry then I thought. This entire concept seems to be pretty odd :-| Why don't you do the filtering on the fly? Read, filter, write...? Anyway, have fun with this monster solution ;-]

– t3chb0t
10 hours ago

@DanW: it still sounds like treating the input as one giant string is not the most efficient approach. If you really can't process it in a streaming fashion, then did you look into specialized data structures such as ropes, gap buffers, piece tables, that sort of stuff?

– Pieter Witvoet
10 hours ago

|
show 9 more comments

1 Answer
1

active

oldest

votes

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;

Some of these names are rather cryptic. I'm not sure why c isn't private. And surely some of the fields should be readonly?

        pagesize = (long)Math.Pow(2, pagedepth);

IMO it's better style to use 1L << pagedepth.

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }

Shouldn't this have bounds checks?

    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

There's no need for this to be public: you can make it internal and give your unit test project access with [assembly:InternalsVisibleTo]. Also, since it's for testing purposes, it could probably be marked [System.Diagnostics.Conditional("DEBUG")].

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

In C# it's conventional for method names to start with an upper case letter.

There's no need to throw quite as much to the garbage collector. Consider as an alternative:

var page0 = c[0];

c.Clear();

c.Add(page0);

    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

Why? I don't think it sheds any light on the following method.

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }

Missing some disposal: I'd use a using statement.

new string(char) copies the entire array to ensure that the string is immutable. That's completely unnecessary here: StreamWriter has a method Write(char, int, int).

    public void fileOpen(string path)   {

        clear();

Yikes! That should be mentioned in the method documentation.

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

I think this can give rise to inconsistencies. Other methods seem to assume that if the length of the BigStringBuilder is an exact multiple of pagesize then currentPosInPage == 0 and c[currentPage] is empty, but this can give you currentPosInPage == pagesize and c[currentPage] is full.

This method is also missing disposal.

    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }

Why is this a method rather than a property? Why use multiplication rather than <<?

    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

What is 8s? Why append one character at a time? StringBuilder also has a method which takes (char, int, int).

    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

What does this method do? The name implies something regexy, but there's no regex in sight. The implementation looks like StartsWith (by default - the offset complicates it).

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }

Bounds checks?

    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster.

I'm not surprised. It's still copying character by character. It's almost certainly faster to use string.CopyTo (thanks to Pieter Witvoet for mentioning this method) or ReadOnlySpan.CopyTo.

edited 10 hours ago

answered 11 hours ago

Peter Taylor

17.7k2962

$begingroup$
Thanks! Added responses to my main post if you want to look.
$endgroup$
– Dan W
10 hours ago

1

$begingroup$
Regarding the last point, there's also string.CopyTo.
$endgroup$
– Pieter Witvoet
10 hours ago

$begingroup$
c# class instance members are private by default, so c is private. But you're right that it is inconsistent to not explicitly declare it private like the other fields are. docs.microsoft.com/en-us/dotnet/csharp/language-reference/…
$endgroup$
– BurnsBA
7 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Dan W is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214917%2fversion-of-c-stringbuilder-to-allow-for-strings-larger-than-2-billion-character%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;

Some of these names are rather cryptic. I'm not sure why c isn't private. And surely some of the fields should be readonly?

        pagesize = (long)Math.Pow(2, pagedepth);

IMO it's better style to use 1L << pagedepth.

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }

Shouldn't this have bounds checks?

    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

In C# it's conventional for method names to start with an upper case letter.

There's no need to throw quite as much to the garbage collector. Consider as an alternative:

var page0 = c[0];

c.Clear();

c.Add(page0);

    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

Why? I don't think it sheds any light on the following method.

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }

Missing some disposal: I'd use a using statement.

new string(char) copies the entire array to ensure that the string is immutable. That's completely unnecessary here: StreamWriter has a method Write(char, int, int).

    public void fileOpen(string path)   {

        clear();

Yikes! That should be mentioned in the method documentation.

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

This method is also missing disposal.

    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }

Why is this a method rather than a property? Why use multiplication rather than <<?

    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

What is 8s? Why append one character at a time? StringBuilder also has a method which takes (char, int, int).

    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

What does this method do? The name implies something regexy, but there's no regex in sight. The implementation looks like StartsWith (by default - the offset complicates it).

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }

Bounds checks?

    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster.

I'm not surprised. It's still copying character by character. It's almost certainly faster to use string.CopyTo (thanks to Pieter Witvoet for mentioning this method) or ReadOnlySpan.CopyTo.

edited 10 hours ago

answered 11 hours ago

Peter Taylor

17.7k2962

$begingroup$
Thanks! Added responses to my main post if you want to look.
$endgroup$
– Dan W
10 hours ago

1

$begingroup$
Regarding the last point, there's also string.CopyTo.
$endgroup$
– Pieter Witvoet
10 hours ago

$begingroup$
c# class instance members are private by default, so c is private. But you're right that it is inconsistent to not explicitly declare it private like the other fields are. docs.microsoft.com/en-us/dotnet/csharp/language-reference/…
$endgroup$
– BurnsBA
7 hours ago

add a comment |

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;

Some of these names are rather cryptic. I'm not sure why c isn't private. And surely some of the fields should be readonly?

        pagesize = (long)Math.Pow(2, pagedepth);

IMO it's better style to use 1L << pagedepth.

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }

Shouldn't this have bounds checks?

    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

In C# it's conventional for method names to start with an upper case letter.

There's no need to throw quite as much to the garbage collector. Consider as an alternative:

var page0 = c[0];

c.Clear();

c.Add(page0);

    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

Why? I don't think it sheds any light on the following method.

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }

Missing some disposal: I'd use a using statement.

new string(char) copies the entire array to ensure that the string is immutable. That's completely unnecessary here: StreamWriter has a method Write(char, int, int).

    public void fileOpen(string path)   {

        clear();

Yikes! That should be mentioned in the method documentation.

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

This method is also missing disposal.

    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }

Why is this a method rather than a property? Why use multiplication rather than <<?

    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

What is 8s? Why append one character at a time? StringBuilder also has a method which takes (char, int, int).

    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

What does this method do? The name implies something regexy, but there's no regex in sight. The implementation looks like StartsWith (by default - the offset complicates it).

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }

Bounds checks?

    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster.

I'm not surprised. It's still copying character by character. It's almost certainly faster to use string.CopyTo (thanks to Pieter Witvoet for mentioning this method) or ReadOnlySpan.CopyTo.

edited 10 hours ago

answered 11 hours ago

Peter Taylor

17.7k2962

$begingroup$
Thanks! Added responses to my main post if you want to look.
$endgroup$
– Dan W
10 hours ago

1

$begingroup$
Regarding the last point, there's also string.CopyTo.
$endgroup$
– Pieter Witvoet
10 hours ago

$begingroup$
c# class instance members are private by default, so c is private. But you're right that it is inconsistent to not explicitly declare it private like the other fields are. docs.microsoft.com/en-us/dotnet/csharp/language-reference/…
$endgroup$
– BurnsBA
7 hours ago

add a comment |

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;

Some of these names are rather cryptic. I'm not sure why c isn't private. And surely some of the fields should be readonly?

        pagesize = (long)Math.Pow(2, pagedepth);

IMO it's better style to use 1L << pagedepth.

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }

Shouldn't this have bounds checks?

    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

In C# it's conventional for method names to start with an upper case letter.

There's no need to throw quite as much to the garbage collector. Consider as an alternative:

var page0 = c[0];

c.Clear();

c.Add(page0);

    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

Why? I don't think it sheds any light on the following method.

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }

Missing some disposal: I'd use a using statement.

new string(char) copies the entire array to ensure that the string is immutable. That's completely unnecessary here: StreamWriter has a method Write(char, int, int).

    public void fileOpen(string path)   {

        clear();

Yikes! That should be mentioned in the method documentation.

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

This method is also missing disposal.

    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }

Why is this a method rather than a property? Why use multiplication rather than <<?

    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

What is 8s? Why append one character at a time? StringBuilder also has a method which takes (char, int, int).

    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

What does this method do? The name implies something regexy, but there's no regex in sight. The implementation looks like StartsWith (by default - the offset complicates it).

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }

Bounds checks?

    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster.

I'm not surprised. It's still copying character by character. It's almost certainly faster to use string.CopyTo (thanks to Pieter Witvoet for mentioning this method) or ReadOnlySpan.CopyTo.

edited 10 hours ago

answered 11 hours ago

Peter Taylor

17.7k2962

    List<char> c = new List<char>();

    private int pagedepth;

    private long pagesize;

    private long mpagesize;         // https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c

    private int currentPage = 0;

    private int currentPosInPage = 0;

Some of these names are rather cryptic. I'm not sure why c isn't private. And surely some of the fields should be readonly?

        pagesize = (long)Math.Pow(2, pagedepth);

IMO it's better style to use 1L << pagedepth.

    public char this[long n]    {

        get { return c[(int)(n >> pagedepth)][n & mpagesize]; }

        set { c[(int)(n >> pagedepth)][n & mpagesize] = value; }

    }

Shouldn't this have bounds checks?

    public string returnPagesForTestingPurposes() {

        string s = new string[currentPage + 1];

        for (int i = 0; i < currentPage + 1; i++) s[i] = new string(c[i]);

        return s;

    }

    public void clear() {

        c = new List<char>();

        c.Add(new char[pagesize]);

In C# it's conventional for method names to start with an upper case letter.

There's no need to throw quite as much to the garbage collector. Consider as an alternative:

var page0 = c[0];

c.Clear();

c.Add(page0);

    // See: https://stackoverflow.com/questions/373365/how-do-i-write-out-a-text-file-in-c-sharp-with-a-code-page-other-than-utf-8/373372

Why? I don't think it sheds any light on the following method.

    public void fileSave(string path)   {

        StreamWriter sw = File.CreateText(path);

        for (int i = 0; i < currentPage; i++) sw.Write(new string(c[i]));

        sw.Write(new string(c[currentPage], 0, currentPosInPage));

        sw.Close();

    }

Missing some disposal: I'd use a using statement.

new string(char) copies the entire array to ensure that the string is immutable. That's completely unnecessary here: StreamWriter has a method Write(char, int, int).

    public void fileOpen(string path)   {

        clear();

Yikes! That should be mentioned in the method documentation.

        StreamReader sw = new StreamReader(path);

        int len = 0;

        while ((len = sw.ReadBlock(c[currentPage], 0, (int)pagesize)) != 0){

            if (!sw.EndOfStream)    {

                currentPage++;

                if (currentPage == c.Count) c.Add(new char[pagesize]);

            }

            else    {

                currentPosInPage = len;

                break;

This method is also missing disposal.

    public long length()    {

        return (long)currentPage * (long)pagesize + (long)currentPosInPage;

    }

Why is this a method rather than a property? Why use multiplication rather than <<?

    public string substring(long x, long y) {

        StringBuilder sb = new StringBuilder();

        for (long n = x; n < y; n++) sb.Append(c[(int)(n >> pagedepth)][n & mpagesize]);    //8s

What is 8s? Why append one character at a time? StringBuilder also has a method which takes (char, int, int).

    public bool match(string find, long start = 0)  {

        //if (s.Length > length()) return false;

        for (int i = 0; i < find.Length; i++) if (i + start == find.Length || this[start + i] != find[i]) return false;

        return true;

    }

What does this method do? The name implies something regexy, but there's no regex in sight. The implementation looks like StartsWith (by default - the offset complicates it).

    public void replace(string s, long pos) {

        for (int i = 0; i < s.Length; i++)  {

            c[(int)(pos >> pagedepth)][pos & mpagesize] = s[i];

            pos++;

        }

    }

Bounds checks?

    // This method is a more sophisticated version of the Append() function above.

    // Surprisingly, in real-world testing, it doesn't seem to be any faster.

I'm not surprised. It's still copying character by character. It's almost certainly faster to use string.CopyTo (thanks to Pieter Witvoet for mentioning this method) or ReadOnlySpan.CopyTo.

edited 10 hours ago

answered 11 hours ago

Peter Taylor

17.7k2962

edited 10 hours ago

answered 11 hours ago

Peter Taylor

17.7k2962

answered 11 hours ago

Peter Taylor

17.7k2962

answered 11 hours ago

Peter Taylor

17.7k2962

$begingroup$
Thanks! Added responses to my main post if you want to look.
$endgroup$
– Dan W
10 hours ago

1

$begingroup$
Regarding the last point, there's also string.CopyTo.
$endgroup$
– Pieter Witvoet
10 hours ago

$begingroup$
c# class instance members are private by default, so c is private. But you're right that it is inconsistent to not explicitly declare it private like the other fields are. docs.microsoft.com/en-us/dotnet/csharp/language-reference/…
$endgroup$
– BurnsBA
7 hours ago

add a comment |

$begingroup$
Thanks! Added responses to my main post if you want to look.
$endgroup$
– Dan W
10 hours ago

1

$begingroup$
Regarding the last point, there's also string.CopyTo.
$endgroup$
– Pieter Witvoet
10 hours ago

$begingroup$
c# class instance members are private by default, so c is private. But you're right that it is inconsistent to not explicitly declare it private like the other fields are. docs.microsoft.com/en-us/dotnet/csharp/language-reference/…
$endgroup$
– BurnsBA
7 hours ago

Thanks! Added responses to my main post if you want to look.

– Dan W
10 hours ago

Regarding the last point, there's also string.CopyTo.

– Pieter Witvoet
10 hours ago

c# class instance members are private by default, so c is private. But you're right that it is inconsistent to not explicitly declare it private like the other fields are. docs.microsoft.com/en-us/dotnet/csharp/language-reference/…

– BurnsBA
7 hours ago

add a comment |

Dan W is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Dan W is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr