Forum

> > CS2D > Scripts > find length of string outside brackets
Forums overviewCS2D overview Scripts overviewLog in to reply

English find length of string outside brackets

17 replies
To the start Previous 1 Next To the start

old find length of string outside brackets

Torque
User Off Offline

Quote
I want to find the length of a string containing x characters outside the brackets.

Specific example: I have a string like 'abc(123)def ghij klm (456) opqrs' Now I want to cut that string at the point where there are 10 characters that are not between brackets. So in this case between i and j, position 15. How do I do this?

I came up with this

1
2
3
str = "abc(123)def ghij klm (456) opqrs"
maxchars = 10 
position = maxchars + (maxchars - str:sub(1,maxchars):gsub("%b()",""):len())

But that doesn't work when the brackets are not closed within the first 10 characters.

Thanks in advance for your help!

old Re: find length of string outside brackets

Alpha Beta
User Off Offline

Quote
You have to split up that long third line first. You gotta check if the maxchars is long enough.
So sub the string by maxchars, and if it has the pattern "...(...", you have to make it longer, until it reaches a ")".
Then just do the rest.

I don't use gsub often, so I have to figure it out as well.
edited 4×, last 02.12.14 09:20:12 pm

old Re: find length of string outside brackets

Torque
User Off Offline

Quote
Thanks for your suggestions guys
@user Alistaire:
Input: Any string of any length, any integer X.
Output: The substring of Input that contains X characters that are not within round brackets '()'.

So in the example string, if X=5 it should return "abc(123)de". This is the substring that has 5 characters outside the brackets. If X=10 it should return "abc(123)def ghi", this substring has 10 characters outside the brackets.

@user Joni And Friends:
Your code returns
1
s:len() - maxchars

old Re: find length of string outside brackets

DC
Admin Off Offline

Quote
From what I understood he wants something like string.sub but it should ignore all chars which are enclosed in "(" and ")" in the length (but still return them in the result).

This code should do:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function getSubstrWithoutBrackets(str, len)
	local l = str:len()
	local count = 0
	bracket = false
	local i
	for i = 1, l do
		sub = str:sub(i, i)
		if not bracket then
			if sub == "(" then
				bracket = true
			else
				count = count + 1
				if count >= len then
					return print(str:sub(1, i))
				end
			end
		elseif sub == ")" then
			bracket = false
		end
	end
	return str
end

print(getSubstrWithoutBrackets("abc(123)def ghij klm (456) opqrs", 5))
print(getSubstrWithoutBrackets("abc(123)def ghij klm (456) opqrs", 18))
print(getSubstrWithoutBrackets("abc(123)def ghij klm (456) opqrs", 100))
Tested it here: http://www.lua.org/cgi-bin/demo

This is the plain and stupid programmatic approach. Simply iterate over a string and increase a counter unless the chars are within ( and ). Return the string to the current position when the counter reached the target size. Return the entire string if the end of the counter isn't reached while iterating.

It's probably also possible with fancy regex somehow and much shorter than code but I'm no regex pro.
edited 1×, last 03.12.14 01:20:43 pm

old Re: find length of string outside brackets

VADemon
User Off Offline

Quote
@user Torque:
After failing on improvising (because of math! ), I made a draft on paper and wrote this all down into a little algorithm:
Because iterating through every letter is for DCs >


It's actually not hard. Every iteration it searches for brackets (.-) and if it finds one, it calculates the amount of real text behind it (excluding brackets) just to know when to stop. At the same time it counts how long the bracket-texts are. Finally this number is added like this: yourString:sub(len + bracketLength) which gives us the result.

Rename the function as you wish. Whereby
bracketL is the total length of brackets
textL is just a raw real text length
prev2, curr1, curr2 = 1 is the opening bracket, 2 the closing one.

old Re: find length of string outside brackets

Torque
User Off Offline

Quote
Thanks for your contribution VADemon. This problem might be harder than you think:
thestring = "ab(c(123)def(1) g(a=false)hijk4325$#25432lmf(325)opqus"
Catches your program in an infinite loop

old Re: find length of string outside brackets

DC
Admin Off Offline

Quote
My code doesn't handle encapsulated brackets correctly either because you didn't mention this as a requirement. It doesn't crash but it will return wrong values because it will continue to count as soon as one single ")" occurs after any number of "(". It's rather trivial to fix though: Replace the bracket boolean with a counter which starts with 0 and which counts the bracket depth. +1 for "(" and -1 for ")". Then only increase the regular counter when the bracket depth is 0. Otherwise just check for "(" and ")" and change the bracket depth accordingly.

@user VADemon: I think my code is more straightforward. I didn't have to make a draft to write it I also doubt that yours is more efficient (because find must iterate as well internally) but that probably doesn't matter at all in this case.
edited 1×, last 04.12.14 10:29:40 am

old Re: find length of string outside brackets

Torque
User Off Offline

Quote
Thanks for your suggested improvement DC. It wasn't a requirement to handle encapsulated brackets correctly. I expect correctly closed brackets of depth 1 in the string. But it shouldn't crash when the userinput contains a typo

I think that is a kind of standard requirement of any code, that it can't become a closed loop in any circumstance.

old Re: find length of string outside brackets

DC
Admin Off Offline

Quote
Okay I see.. yes, that's true of course. At least if you can't be sure that the input is well-formed / as expected - which obviously is the case when using direct unchecked user input.

You could add simple error checking as well. Just do the changes I explained above and add these conditions to the loop:
if bracketDepth > 1 then MALFORMED_BRACKETS_ERROR ("(" used after still unclosed "(", a ")" is expected)
if bracketDepth < 0 then MALFORMED_BRACKETS_ERROR (")" without preceeding unclosed "(")

old Re: find length of string outside brackets

Lee
Moderator Off Offline

Quote
Here's another algorithm that computes what you asked for:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Parse takes an action table t and computes the fixed point of this grammar + string
function parse(t)
    if t:terminate() then return end
    local peek = t.hash(string.char(t.stream:byte(t.at)))
    t.at = t.at + 1
    t[peek](t)
    return t
end

-- Stack of characters consumed grouped by bracketing level
local stack = {0}
-- Maximum number of characters to parse before breaking
local max_char = 10

function push() table.insert(stack, 1) end
function pop() assert(#stack > 1, 'Dangling )'); table.remove(stack, #stack) end
function count() stack[#stack] = stack[#stack] + 1 end

trace = parse {
    -- This is the string of character we want to parse
    stream = "1234(xx()X(x))5678()90~~~~~~~",
    -- At the end of the run, at will point to the end of the 10 unbracketed characters
    at = 1,
    -- This is a function that maps each character to an action label (so '(' = left, 'r' = right, etc)
    hash = function(char) return ({['('] = 'left', [')'] = 'right', [''] = 'null'})[char] or 'character' end,
    -- What to do when we encounter a normal character
    character = function(t) count(); parse(t) end,
    -- What to do when we encounter a left parenthesis
    left = function(t) push(); parse(t) end,
    -- What to do when we encounter a right parenthesis
    right = function(t) pop(); parse(t) end,
    -- What to do when we encounter the end of the stream
    null = function(t) --[[Reduce stack]] assert(#stack == 1, 'Dangling (') end,
    -- The condition for us to terminate
    terminate = function(t) return stack[1] >= max_char end,
    -- What was parsed/consumed in this trace
    consumed = function(t) return t.stream:sub(1, trace.at - 1) end}

print(trace:consumed())

This is created from a BNF grammar of the language of matching parenthesis. This generalizes nicely, and if you want to compute other properties of these types of string, you can change the action associated with each character class from push/pop/count to other functions because you effectively have the entire tree corresponding to the bracketing structure of the string.

old Re: find length of string outside brackets

Flacko
User Off Offline

Quote
I still don't get what was wrong with the balanced bracket matching (%b), after tinkering with it for a while I got this (I must admit that my algorithm-fu is not the strongest):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function weird(str, n)
	--remember the position of the last right bracket
	local lastright = 0
	--find the next pair of balanced brackets
	local left, right = str:find("%b()")
	while left ~= right do
		--if there is a bracket between lastright and the next pair, the brackets are unbalanced
		if str:find("[%(%)]", lastright+1) < left then
			print("unbalanced brackets")
			return nil
		end
		--if we can return in this section, do so
		if left - lastright > n then
			return str:sub(1, lastright + n)
		end
		--decrease remaining characters
		n = n - (left - lastright - 1)
		lastright = right
		left, right = str:find("%b()", right)
	end
	if str:find("[%(%)]", lastright+1) then
		print("unbalanced brackets")
		return nil
	end
	if str:len() - lastright > n then
		return str:sub(1, lastright + n)
	end
	return str
end
Returns at most n characters outside brackets.
It should work with nested parentheses and should error when it finds an unbalanced pair of brackets that could affect the output.
edited 2×, last 12.12.14 03:56:27 pm
To the start Previous 1 Next To the start
Log in to reply Scripts overviewCS2D overviewForums overview