Excel, Functions and Formulas, Tutorial / Sample, Text and Strings #2 (Advanced)
Today when I was looking through the stackoverflow forum, I came up to this question: vba-clean-up-conditional-duplication-of-rows Though the question was actually VBA related I was able to find a solution that did not require any programming. Since the solution was challenging and interesting I thought I make a post about it. In this article I will provide a tutorial on how I came to the none VBA solution.
Please take note that this article is for the more advanced excel users, so if you are just learning excel I would suggest taking a look at some of the easier tutorials, Excel Functions and Formulas, Split String Based on Delimiter.
One of the main purposes of making this article is to explain the capabilities of the formulas in Excel. It shows that VBA is not always needed, and solutions to complex problems can also be obtained using Excel’s formulas and functions.
You can download the workbook related to this article here.
Jump To:
- The Problem
- Functions Used
- Step 1, Auxiliary Column C
- Step 2, Auxiliary Column E
- Step 3, Auxiliary Column F
- Step 4, Auxiliary Column G
- Step 5, Auxiliary Column D
- Step 6, Column A
- Step 7, Column B
- Step 8, Hide the Auxiliary Column
- Result
Contents
The Problem:
Basically there are text values written in column A of sheet1:
These text values can have 3 different formats. Based on the format of the text string, different values will be written in column A and B of sheet2. In the figure above you can see all 3 different types. Below I will explain what makes the 3 text values in the figure above different. Through out this article the figure above will be used as our sample input.
The first text string is “RICH01”. This text string does not contain the “/” the “-” character. Throughout this article text strings that don’t contain these 2 characters will be known as Text String Type 1 . These text strings are copied to the next empty cell in column A of sheet2. Also the number “1” is written in the cell adjacent to them in column B:
The next text value in sheet 1is “AJ01/LM03”. As you can see this text string contains the “/”, but does not contain the character “-“. Throughout this article text string that contain the “/’ character but not the “-” character will be know as Text String Type 2 . Unlike Text String Type 1 these text strings are split into 2 parts at the location of the “/” character. The two parts are printed in the next empty cell in column A of sheet 2. Also the value “0.5” is printed in the adjacent cell in column B:
The next text value is “AJ01-25/ST01-75”. It can be seen that this text string has both the characters “/” and “-“. Throughout this article text strings that contain both the “/” and the “-” will be know as Text String Type 3 . Similar to Text String Type 2, Text String Type 3 is split at the location of the “/” character. Also the numeric part after the “-” is separated from the rest of the text. The text parts are printed in the next empty cell in sheet 2 and the numeric part is divided by 100 and printed in the adjacent cell in column B
The final result in sheet 2 can be seen below:
Functions Used:
The following functions have been used in this tutorial:
It is assumed that the reader is familiar with the functions above.
Step 1, Auxiliary Column C:
In solving this problem several auxiliary columns have been used which later have been hidden. One of the challenges is to figuring out which of the 3 text string types each text value in sheet1 is:
The auxiliary columns help us in this matter.
The main purpose of column C is to check if the character “/” exists in the text string or not. If it exists it will return the location it was found. If it doesn’t exist it will return an empty cell (“”). Print the following formula in cell C2 and drag or copy it down to cover as many cells as needed in column C:
=IF(E2=1, IF(ISNUMBER(FIND("/",INDIRECT(G2))),FIND("/", INDIRECT(G2)), ""), "")
E2 = 1?
E2 is not“1” it would mean the current row in sheet2 is printing the second part of a Text string Type 2 or Text String Type3.
E2 being “1” could mean 2 things:
- The current row in sheet2 is printing a Text String Type 1
- The current row in sheet2 is printing the first part of a Text string Type 2 or Text String Type3
See auxiliary column E.
Print “” (Nothing)
If E2 is not “1” then current row in sheet2 is printing the second part of a Text string Type 2 or Text String Type3. Therefore the previous row E1 was equal to “1” and we have already calculated the location of the “/” character and there is no need to calculate it again. Therefore an empty cell is printed.
See auxiliary column E.
Is “/” in text
The formula FIND(“/”,INDIRECT(G2)) checks the text value in column A of sheet 1 for the “/” character. If it finds it, it will return a number representing the location it was found. If the character “/” doesn’t exist in the text value a non numeric value will be returned. ISNUMBER() checks if FIND(“/”,INDIRECT(G2)) returned a numeric value or a non numeric value and therefore if the character “/” has been found in the text value in sheet1 or not.
See auxiliary column G.
Print Location
If the character “/” was found in the text value in sheet1 then the location of the character is printed with the function FIND(“/”, INDIRECT(G2)).
See auxiliary column G.
Print “” (Nothing)
If the character “/” was not found in the text value in sheet1 then an empty cell is printed. The figure below shows the content of column C for the input data described in The Problem section:
Row 2 in sheet 2 is for the text value “RICH01” from row 1 of sheet1. “RICH01” does not have the “/” character therefore cell C2 is empty.
C3:
Row 3 in sheet 2 is for the text value “AJ01” which is the first part of the text string “AJ01/LM01” from row 2 of sheet1. “AJ01/LM01” contains the “/” character therefore the location of the character is printed in cell C3.
C4:
Row 4 in sheet 2 is for the text value “LM03” which is the second part of the text string “AJ01/LM01” from row 2 of sheet1. “AJ01/LM01” contains the “/” character but the index of the character has already been printed in the previous row, so it is not printed again in C4.
Step 2, Auxiliary Column E:
We need a column to map the current row in sheet 2 to the current row in sheet 1. As mentioned in The Problem section there 3 types of text values we might encounter.
- Text String Type 1: Only creates one row of data in sheet2
- Text string Type 2 and Text String Type3: Creates 2 rows of data in sheet2.
That is why we need a column to keep track of which row we are it in sheet1. The formula in column E accepts 2 values:
- The value “1”
- The value “2”
The value “1” could mean several things:
- The current row in sheet2 is printing the first part of a Text string Type 2 or Text String Type3.
- The current row in sheet2 is printing a Text String Type 1
The value “2” would mean that the current row in sheet2 is printing the second part of a Text string Type 2 or Text String Type3.
In cell E2 type the value “1”. In cell E3 write the following formula:
=IF(E2=1, IF(C2="", 1, 2),1)
Drag the formula down to cover as many rows as needed:
E2 = 1?
E2 is not “1” would mean that it is “2”. Therefore it would mean the previous row in sheet2 was printing the second part of a Text string Type 2 or Text String Type3.
Note: The flow chart above is for the cell E3, Therefore E2 is one row up.
See auxiliary column E.
Print “1”
If E2 was not “1” then it was “2”. That would mean 2 things:
- The previous text value in sheet 1 was a a Text string Type 2 or Text String Type3
- The previous row in sheet2 was printing the second part of the text value in sheet 1.
Therfore:
- The current row in sheet2 is printing the first part of a Text string Type 2 or Text StringType3.
- The current row in sheet2 is printing a Text String Type 1
which would mean the value in the current row in column E must be “1”.
C2 =””?
C2 being empty would mean the text value in sheet1 for the previous row in sheet2 did not have a “/” character and therefore was a Text String Type 1.
Print “1”
if E2 is “1” and “C2” is empty then that could mean two things:
- The current row in sheet2 is printing the first part of a Text string Type 2 or Text String Type3.
- The current row in sheet2 is printing a Text String Type 1
which would mean the value in the current row in column E must be “1”.
Print “2”
if E2 is “1” and “C2” is not empty that would mean the The current row in sheet2 is printing the second part of a Text string Type 2 or Text String Type3. Therefore the current row in column E must be “2”.
The figure below shows the content of column E for the input data described in The Problem section:
E2:
The first row is set to “1′ manually.
E3:
Row 3 in sheet 2 is for the text value “AJ01” which is the first part of the text string “AJ01/LM01” from row 2 of sheet1. Therefore it contains the value “1”.
E4:
Row 4 in sheet 2 is for the text value “LM03” which is the second part of the text string “AJ01/LM01” from row 2 of sheet1. Therefore it contains the value “2”.
Step 3, Auxiliary Column F:
As mentioned in the The Problem section some text strings from sheet1 require 2 rows on sheet2 while some text strings require only 1 row on sheet2. Therefor we would need to somehow map the rows in sheet 2 with the rows in sheet1. This is done by the auxiliary column F. The value “1” is written in F2 and the formula below is written in F3 and dragged /copied to the rest of the cells in column F:
=IF(E3=1, F2+1, F2)
The formula above checks if E3 is equal to “1” or not.
E3 = 1?
E3 being equal to “1” could mean 2 things:
- The current row in sheet2 is printing a Text String Type 1
- The current row in sheet2 is printing the first part of a Text string Type 2 or Text String Type3
In both cases it would mean that a new row has started in sheet 1 therefore the value in column F is incremented.
E3 not being equal to “1” would mean the following:
- E3 is equal to “2”
- We are working with the second part of the text string in sheet1
- We are still using the same test string in sheet1
- We are still working with the same row in sheet1.
Therefore the same value in the previous cell in column F is printed in the current row of column F.
See auxiliary column E.
The figure below shows the content of column F for the input data described in The Problem section:
The number “1” is manually set in this cell. Indicating this row in sheet 2 is getting its text string from row 1 in sheet 1.
F3:
Row 4 of sheet2 prints the first part of the text string “AJ01/LM03”, which is “Aj01”. The text string “AJ01/LM03” is retrieved from the second row in sheet 1. Therefore the value “2” is printed in F3
F4:
Row 4 of sheet2 prints the second part of the text string “AJ01/LM03”, which is “LM03”. The text string “AJ01/LM03” is retrieved from the second row in sheet 1. Therefore the value “2” is printed in F4.
Step 4, Auxiliary Column G:
Auxiliary column 4, column G displays the text representation of the reference to the cell that the text string must be retrieved from. This text representation is used later along with the INDIRECT function to return the value of that cell. This is the formula used in cell G2, which is later dragged / copied to the rest of the cells in column G:
="Sheet1!A" & F2
The formula above adds (concatenates) the text “Sheet1!A” with whatever value is in cell F2. The figure below shows the content of column G for the input data described in The Problem section:
G2:
Row 2 in sheet2 prints the text string “RICH01” which is retrieved from the cell “Sheet1!A1”.
G3:
Row 3 in sheet2 prints the text string “AJ01” which is the first part of the text string “AJ01/LM03”. The text string “AJ01/LM03” is retrieved from the cell “Sheet1!A2”
G4:
Row 4 in sheet2 prints the text string “LM03” which is the second part of the text string “AJ01/LM03”. The text string “AJ01/LM03” is retrieved from the cell “Sheet1!A2”
Step 5, Auxiliary Column D:
We need a column to determine whether the “-” character has been found or not. This is achieved using the following formula in cell D2:
=IF(E2=1, IF(ISNUMBER(FIND("-", INDIRECT(G2))),FIND("-", INDIRECT(G2)), ""),IF(D1 ="", "", FIND("-", INDIRECT(G2), D1+1)))
The formula is later copied / dragged to cover the rest of the cells in column D. The formula above has 2 paths:
Path 1:
E2 being “1” could mean 2 things:
- The current text string in this row of sheet1 is Text String Type 1.
- The current row in sheet 2 is for the first part of a text string of the type Text string Type 2 or Text String Type3
See Column E was was auxiliary column E
ISNUMBER(FIND(“-“, INDIRECT(G2)))
ISNUMBER checks if the input value is a number or not. The input value to the ISNUMBER function is the index of the first occurrence of the “-” character in the text string in sheet 1 associated with the current row of sheet2. For more information about the FIND function please see Excel Functions and Formulas Finding Text Within Another Text FIND.If the character “-” does not exist in the text string , then the FIND function will not return a numeric value, therefore ISNUMBER will return false.
Print “” (Nothing)
If the “-” character does not exist in the text string we want the cell to remain empty.
FIND(“-“, INDIRECT(G2))
If the “-” character does exist in the text string, we want the index of the first occurrence to be printed in the cell.
Path 2:
E2 =1?
E2 is not“1” then it would mean we are working with the second part of a Text string Type 2 or Text String Type3.
D1 = “”?
D1 = “” would mean that the previous row in sheet 2 did not find the character “-” in the text string in sheet 1.
Print “” (nothing)
E2 not being “1” and D1 being empty would mean the “-” character does not exist in the text string and therfore the current text string is a Text string Type 2. An empty value will be printed in the cell.
FIND(“-“, INDIRECT(G2), D1+1))
If the “-” character has been found in the previous row while working with the first part of the text string from sheet 1(before the “/” character) in this row the index of the second occurrence of the “-” character will be printed
The figure below shows the content of column D for the input data described in The Problem section:
D2:
Row 2 of sheet 2 is for the text string “RICH01” from row 1 of sheet 1. “RICHo1” does not contain the “-” therefore D2 is left empty.
D3:
Row 3 of sheet 2 is for the text string “AJ01” which is the first part of the text string “AJ01/LM03” from row 2 of sheet 1. “AJ01/LM03” does not contain the “-” character therefore D3 is left empty.
D4:
Row 4 of sheet 2 is for the text string “LM03” which is the second part of the text string “AJ01/LM03” from row 2 of sheet 1. “AJ01/LM03” does not contain the “-” character therefore D4 is left empty.
D5:
Row 5 of sheet 2 is for the text string “AJ01” which is the first part of the text string “AJ01-25/ST01-75” from row 3 of sheet 1. “AJ01-25/ST01-75” contains the “-” character. The first occurrence of the “-” character is at index 5 therefore 5 is printed in D4.
D6:
Row 6 of sheet 2 is for the text string “ST01” which is the second part of the text string “AJ01-25/ST01-75” from row 3 of sheet 1. “AJ01-25/ST01-75” contains the “-” character. The second occurrence of the “-” character is at index 13 therefore 13 is printed in D4.
Step 6, Column A:
For Text String Type 1., column A displays the entire text. For the other 2 types of text strings, column A displays the first and second part of the text string in separate rows.The following formula is written in A2 and dragged / copied to cover the rest of the cells:
=IF(E2=1, IF(D2="", IF(C2="", INDIRECT(G2), LEFT(INDIRECT(G2), C2-1)), LEFT(INDIRECT(G2), D2-1)), IF(D2="", RIGHT(INDIRECT(G2), LEN(INDIRECT(G2))-C1), MID(INDIRECT(G2), C1+1, D2 -1-C1)))
The formula above has 2 main paths.
Path 1:
E2 = 1?
E2 being “1” could mean 2 things:
- The current text string in this row of sheet1 is Text String Type 1.
- The current row in sheet 2 is for the first part of a text string of the type Text string Type 2 or Text String Type3
See Column E was was auxiliary column E
D2 = “”
D2 being empty would mean the “-” character was not found. If D2 is not “” (empty) then it would mean the “-” character has been found.
See Auxiliary Column D.
LEFT(INDIRECT(G2), C2-1)
If D2 is not empty, that would mean the “-” character has been found. LEFT(INDIRECT(G2), C2-1) returns a substring from the text in sheet1 associated with the current row in sheet2 up to the “-” character.
See auxiliary column C and auxiliary column G.
C2 =””
C2 being empty would mean the current text string does not have the “/” character, therefore the string in sheet1 associated with the current row in sheet 2 is a Text String Type 1.
INDIRECT(G2)
If C2 is empty, which would mean the current text string in sheet 1 is a Text String Type 1, the entire text string will be printed in the cell.
See auxiliary column G.
LEFT(INDIRECT(G2), C2-1)
If C2 is not empty which would mean mean the character “/” has been found in the text string current text string in sheet1, the first half (up to the “/” character) is printed.
See auxiliary column C and auxiliary column G.
Path 2:
E2 =1?
If E2 is not “1′ then it would be “2”. This would mean that the current row in sheet2 is working with the second part of a Text string Type 2 or Text String Type3.
See auxiliary column E.
D2 = “”?
D2 being equal to nothing would mean the “-” character has not been found.
See Auxiliary Column D.
MID(INDIRECT(G2), C1+1, D1-1-C1)
If D2 has a value in it, then it would mean the “-” character has been found. MID(INDIRECT(G2), C1+1, D1-1-C1), returns a substring starting one index after the “/” character and one index before the “-” character.
See auxiliary column C, auxiliary Column D and auxiliary column E.
RIGHT(INDIRECT(G2), LEN(INDIRECT(G2))-C1)
If D2 does not have a value in it, then RIGHT(INDIRECT(G2), LEN(INDIRECT(G2))-C1) returns a substring starting one index after the “/” character up to the end of the text string.
See auxiliary column C and auxiliary column G.
Step 7, Column B:
Column B was explained in the The Problem section:
- For Text String Type 1 it displays the number “1”.
- For Text string Type 2 it displays the number “0.5” in each of the rows.
- For Text String Type3 it displays the number after the “-” character divided by 100.
The formula below is written in cell B2 and dragged / copied to the rest of column B:
=IF(E2=1,IF(C2="",1,IF(D2="",0.5,MID(INDIRECT(G2),D2+1,C2-D2-1)/100)), IF(D2="",0.5,MID(INDIRECT(G2),D2+1, LEN(INDIRECT(G2))-D2)/100)
The formula above has 2 paths.
Path 1:
E2 being “1” could mean 2 things:
- The current text string in this row of sheet1 is Text String Type 1.
- The current row in sheet 2 is for the first part of a text string of the type Text string Type 2 or Text String Type3
See Column E was was auxiliary column E
C2 = “”?
C2 being empty would mean the current text string does not have the “/” character, therefore the string in sheet1 associated with the current row in sheet 2 is a Text String Type 1.
Print “1”
If E2 is “1” and C2 is empty that would mean the current text string is a Text String Type 1. Therefore the value “1” is printed in cell B2.
D2 = “”?
D2 being equal to nothing would mean the “-” character has not been found.
Print “0.5”
If E2 is “1” and C2 and D2 is empty, then the current text string is a Text string Type 2, therefore the value “0.5” is printed in cell B2.
MID(INDIRECT(G2),D2+1,C2-D2-1)/100
If E2 is “1” and C2 is empty but D2 is not empty then the current text string is a Text String Type3. Therefore MID(INDIRECT(G2),D2+1,C2-D2-1)/100 prints the number after the first “-” character divided by 100.
See auxiliary column C, auxiliary Column D and auxiliary column G.
Path 2:
E2 =1?
If E2 is not “1′ then it would be “2”. This would mean that the current row in sheet2 is working with the second part of a text string of type Text string Type 2 or Text String Type3.
See auxiliary column E.
D2 = “”?
D2 being equal to nothing would mean the “-” character has not been found.
See Auxiliary Column D.
Print “0.5”
If E2 is not “1” and D2 is empty then the current text string is a Text string Type 2, and the value “0.5” is written in column B.
MID(INDIRECT(G2),D2+1, LEN(INDIRECT(G2))-D2)/100
If E2 is not “1” and D2 is not empty then the current text string is a Text String Type3. E2 being “2” would also mean that we are working on the second part of the text string (the part after the “/” character). MID(INDIRECT(G2),D2+1, LEN(INDIRECT(G2))-D2)/100 prints the number after the second “-” character divided by 100.
Step 8, Hide the Auxiliary Columns:
You can hide the auxiliary columns so that only the first 2 columns are made visible:
Result:
Result:
Here is an example with more data on sheet1:
Sheet 1:
Sheet 2:
You can download the workbook related to this article here.
See also:
- Excel Functions and Formulas Finding Text Within Another Text, FIND()
- Excel Functions and Formulas Get Substring, LEFT(), MID(), RIGHT()
- Excel Functions and Formulas IF()
- Excel Functions and Formulas LEN()
- Excel Functions and Formulas INDIRECT()
- Check if Cell Contains Value
If you need assistance with your code, or you are looking for a VBA programmer to hire feel free to contact me. Also please visit my website www.software-solutions-online.com