T-SQL: Creating a hierarchical structure from relational data

Recently, while developing an SSIS package, I came across a scenario where I needed to create parent-child relationships between rows in a relational table. Let’s have a look at some sample data:

Source Data

Notice the values in the WBSCode column, 001 is the parent of all 001-XX rows, 001-02 is the parent of all 001-02-XX rows and so on. We need to create a relationship by populating the ParentID field so that the resulting structure can be traversed using a recursive CTE. Here’s a code snippet to create some sample data.

--create sample data
Declare @ProjectWBS TABLE
(
    ID int IDENTITY(1,1) NOT NULL Primary Key,
    ProjectCode varchar(20) NULL,
    WBSCode varchar(100) NULL,
    WBSDescription varchar(max) NULL,
    ParentID int NULL
)

INSERT @ProjectWBS (ProjectCode, WBSCode, WBSDescription, ParentID)
    Select 'Project1', '001', 'Phase 1', NULL
    union all Select 'Project1', '001-01', 'Project Mgmt', NULL
    union all Select 'Project1', '001-02', 'Requirements', NULL
    union all Select 'Project1', '001-02-01', 'Requirements Meeting', NULL
    union all Select 'Project1', '001-02-02', 'Requirements Write Up', NULL
    union all Select 'Project1', '001-03', 'Installation', NULL
    union all Select 'Project1', '001-03-01', 'Nav Install', NULL
    union all Select 'Project1', '001-03-02', 'WTE Install', NULL
    union all Select 'Project1', '001-03-03', 'Reporting Service', NULL
    union all Select 'Project1', '001-03-04', 'BI Install', NULL
    union all Select 'Project1', '001-04', 'Setup', NULL
    union all Select 'Project1', '001-04-01', 'Financials Setup', NULL
    union all Select 'Project1', '001-04-02', 'Projects Setup', NULL
    union all Select 'Project1', '001-04-03', 'Allocation Setup', NULL
    union all Select 'Project1', '001-04-04', 'WTE Setup', NULL
    union all Select 'Project1', '001-04-05', 'Users and Security', NULL
    union all Select 'Project1', '001-05', 'Training', NULL
    union all Select 'Project1', '001-06', 'Testing', NULL
    union all Select 'Project1', '001-06-01', 'Financial Testing', NULL
    union all Select 'Project1', '001-06-02', 'Project Testing', NULL
    union all Select 'Project1', '001-06-03', 'Project Allocation Testing', NULL
    union all Select 'Project1', '001-06-04', 'WTE Testing', NULL
    union all Select 'Project1', '001-06-05', 'Users and Security Testing', NULL
    union all Select 'Project1', '001-07', 'Onsite Go Live', NULL
    union all Select 'Project1', '001-08', 'Development', NULL
    union all Select 'Project1', '001-08-01', 'Overtime functionality', NULL
    union all Select 'Project1', '001-09', 'Data Migration', NULL
    union all Select 'Project1', '002', 'Phase 2', NULL
    union all Select 'Project1', '002-01', 'Monthly Support', NULL
    union all Select 'Project1', '002-01-02', 'Support Training', NULL
    union all Select 'Project1', '002-01-03', 'Support Account Mgmt', NULL

--Select * from @ProjectWBS

I started by creating the following function that returns no. of occurrences of a particular character in a given string:

CREATE FUNCTION fnc_CountChar(@string varchar(max), @char char(1) )
RETURNS INT
BEGIN
    RETURN (LEN(@string) - LEN(REPLACE(@string, @char, '')))
END

After this, I declared the following CTE:

--create hierarchical structure
;With WBSHierarchy
as
(
    Select Parent.ID, Parent.ProjectCode, Parent.WBSCode, Parent.WBSDescription,
        0 as Level, null as ParentID, Cast(null as varchar(MAX)) as ParentWBSCode
    From @ProjectWBS Parent
    Where dbo.fnc_CountChar(Parent.WBSCode, '-') = 0 --no hiphen "-" in parents

    union All

    Select Child.ID, Child.ProjectCode, Child.WBSCode, Child.WBSDescription,
        Parent.Level+1, Parent.ID as ParentID, Cast(Parent.WBSCode as varchar(max)) as ParentWBSCode
        From @ProjectWBS Child
        inner join WBSHierarchy Parent on Child.WBSCode Like Parent.WBSCode + '-%' -- childWBS = parentWBS-xx
            and dbo.fnc_CountChar(Child.WBSCode, '-') = Parent.Level + 1 --childs should have only one hiphen more than their parent
)

--Select * from WBSHierarchy order by WBSCode

Notice that in the above CTE definition, I considered all rows with single occurrence of “-” as the parent rows. Then, I used the fact that ChildWBS=ParentWBS-XX, and the no. of “-” occurrences in child WBS will be equal to one greater than its parent. Here’s the output of the above CTE.

Hierarchical Structure

Things were much simple after this, and I was able to write a single update query to populate the ParentID fields of all the rows in the original table as follows:

--update parentID
update @ProjectWBS
    set ParentID = H.ParentID
from @ProjectWBS W
inner join WBSHierarchy H on W.ID=H.ID

That’s all. Now the result can be traversed using the following recursive CTE:

--traverse the resulting structure using a recursive CTE
;With WBSHierarchy
As
(
    Select Parent.ID, Parent.ProjectCode, Parent.WBSCode, Parent.WBSDescription, 0 as Level
        from @ProjectWBS Parent
        where Parent.ParentID is null
    union all
    Select Child.ID, Child.ProjectCode, Child.WBSCode, Child.WBSDescription, Parent.Level+1 as Level
        from @ProjectWBS Child
        inner join WBSHierarchy Parent on Child.ParentID=Parent.ID
)

Select ProjectCode, WBSCode, space(Level*4)+WBSDescription as WBSDescription
from WBSHierarchy order by WBSCode

CTE Output

Update:
I got a very nice solution from Mitesh in the comments below. He used the fact that if we drop all the characters after the last “-” then we can get the parent WBS, i.e. 001-01-01 can be transformed into 001-01. Great find, Mitesh! Hence, we can do all the transformation without a recursive CTE or UDF like this:

;WITH CTE
AS
(
	SELECT
		WbsCode,
		SUBSTRING(WbsCode,1,LEN(WbsCode)-
			CASE 
				WHEN CHARINDEX('-',REVERSE(WbsCode)) > 0 
				THEN CHARINDEX('-',REVERSE(WbsCode)) 
			END  
		) AS EParentID ,
		ParentID
	FROM 
		@ProjectWBS
)

UPDATE
	C
SET
	PARENTID	= P.ID
FROM
	@ProjectWBS	P
INNER JOIN
	CTE	C
ON
	C.EParentID	=	P.WBSCode

Let me know if you have any other solution to such scenario.

Advertisements

2 Responses to “T-SQL: Creating a hierarchical structure from relational data”

  1. Mitesh Oswal Says:

    To update parentID can we used following code

    ;WITH CTE
    AS
    (
    	SELECT 
    		WbsCode,SUBSTRING(WbsCode,1,	LEN(WbsCode)-CASE WHEN CHARINDEX('-',REVERSE(WbsCode)) > 0 THEN CHARINDEX('-',REVERSE(WbsCode)) END ) AS EParentID ,
    	ParentID
    FROM 
    	@ProjectWBS
    )
    
    UPDATE
    	C
    SET
    	PARENTID	= P.ID	
    FROM
    	@ProjectWBS	P
    INNER JOIN
    	CTE	C
    ON
    	C.EParentID	=	P.WBSCode
    
  2. Syed Mehroz Alam Says:

    Great, Mitesh! I wonder why I couldn’t thought about string manipulation. I will have the post updated with this approach.
    Thanks again.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: