Trovare tutti i join richiesti per unirsi a livello di codice a una tabella

Data una SourceTable e una TargetTable, vorrei creare una stringa a livello di codice con tutti i join richiesti.

In breve, sto cercando di trovare un modo per creare una stringa come questa:

FROM SourceTable t
JOIN IntermediateTable t1 on t1.keycolumn = t.keycolumn
JOIN TargetTable t2 on t2.keycolumn = t1.keycolumn

Ho una query che restituisce tutte le chiavi esterne per una determinata tabella, ma sto incontrando delle limitazioni nel tentativo di eseguire tutto questo in modo ricorsivo per trovare il percorso di join ottimale e creare la stringa.

SELECT 
    p.name AS ParentTable
    ,pc.name AS ParentColumn
    ,r.name AS ChildTable
    ,rc.name AS ChildColumn
FROM sys.foreign_key_columns fk
JOIN sys.columns pc ON pc.object_id = fk.parent_object_id AND pc.column_id = fk.parent_column_id 
JOIN sys.columns rc ON rc.object_id = fk.referenced_object_id AND rc.column_id = fk.referenced_column_id
JOIN sys.tables p ON p.object_id = fk.parent_object_id
JOIN sys.tables r ON r.object_id = fk.referenced_object_id
WHERE fk.parent_object_id = OBJECT_ID('aTable')
ORDER BY ChildTable, fk.referenced_column_id

Sono sicuro che questo è già stato fatto, ma non riesco a trovare un esempio.

— Metafora
fonte

Cosa succede se ci sono 2 o più percorsi dall'origine alla destinazione?

— ypercubeᵀᴹ

Sì, sarei preoccupato per più potenziali percorsi e anche per un singolo percorso che è più di 2 passaggi. Inoltre, le chiavi comprendevano più di una colonna. Questi scenari getteranno tutti una chiave in qualsiasi soluzione automatizzata.

— Aaron Bertrand

Si noti che anche una sola chiave esterna tra due tabelle consentirà 2 o più percorsi (in realtà un numero illimitato di percorsi di lunghezza arbitraria). Considera la query "trova tutti gli articoli che sono stati inseriti almeno una volta nello stesso ordine con l'articolo X". Avrai bisogno di unirsi OrderItemscon Orderse indietro con OrderItems.

— ypercubeᵀᴹ

@ypercube Bene, inoltre, cosa significa esattamente "il percorso ottimale"?

— Aaron Bertrand

"Percorso JOIN ottimale" significa "la serie più breve di join che unirà la tabella Target alla tabella Source". Se T1 fa riferimento a T2 e T3, T2 fa riferimento a T4 e T3 fa riferimento a T4. Il percorso ottimale da T1 a T3 è T1, T2, T3. Il percorso T1, T2, T4, T3 non sarebbe ottimale perché è più lungo.

— Metafora,

Risposte:

Avevo una sceneggiatura che eseguiva una versione rudimentale di attraversamento di chiavi esterne. L'ho adattato rapidamente (vedi sotto) e potresti essere in grado di usarlo come punto di partenza.

Data una tabella di destinazione, lo script tenta di stampare la stringa di join per il percorso più breve (o uno di essi in caso di vincoli) per tutte le possibili tabelle di origine in modo che le chiavi esterne a colonna singola possano essere attraversate per raggiungere la tabella di destinazione. Lo script sembra funzionare bene sul database con un paio di migliaia di tabelle e molte connessioni FK su cui l'ho provato.

Come altri citano nei commenti, è necessario renderlo più complesso se è necessario gestire chiavi esterne a più colonne. Inoltre, tieni presente che questo non è affatto codice pronto per la produzione e completamente testato. Spero sia un utile punto di partenza se decidi di sviluppare questa funzionalità!

-- Drop temp tables that will be used below
IF OBJECT_ID('tempdb..#paths') IS NOT NULL
    DROP TABLE #paths
GO
IF OBJECT_ID('tempdb..#shortestPaths') IS NOT NULL
    DROP TABLE #shortestPaths
GO

-- The table (e.g. "TargetTable") to start from (or end at, depending on your point of view)
DECLARE @targetObjectName SYSNAME = 'TargetTable'

-- Identify all paths from TargetTable to any other table on the database,
-- counting all single-column foreign keys as a valid connection from one table to the next
;WITH singleColumnFkColumns AS (
    -- We limit the scope of this exercise to single column foreign keys
    -- We explicitly filter out any multi-column foreign keys to ensure that they aren't misinterpreted below
    SELECT fk1.*
    FROM sys.foreign_key_columns fk1
    LEFT JOIN sys.foreign_key_columns fk2 ON fk2.constraint_object_id = fk1.constraint_object_id AND fk2.constraint_column_id = 2
    WHERE fk1.constraint_column_id = 1
        AND fk2.constraint_object_id IS NULL
)
, parentCTE AS (
    -- Base case: Find all outgoing (pointing into another table) foreign keys for the specified table
    SELECT 
        p.object_id AS ParentId
        ,OBJECT_SCHEMA_NAME(p.object_id) + '.' + p.name AS ParentTable
        ,pc.column_id AS ParentColumnId
        ,pc.name AS ParentColumn
        ,r.object_id AS ChildId
        ,OBJECT_SCHEMA_NAME(r.object_id) + '.' + r.name AS ChildTable
        ,rc.column_id AS ChildColumnId
        ,rc.name AS ChildColumn
        ,1 AS depth
        -- Maintain the full traversal path that has been taken thus far
        -- We use "," to delimit each table, and each entry then has a
        -- "<object_id>_<parent_column_id>_<child_column_id>" format
        ,   ',' + CONVERT(VARCHAR(MAX), p.object_id) + '_NULL_' + CONVERT(VARCHAR(MAX), pc.column_id) +
            ',' + CONVERT(VARCHAR(MAX), r.object_id) + '_' + CONVERT(VARCHAR(MAX), pc.column_id) + '_' + CONVERT(VARCHAR(MAX), rc.column_id) AS TraversalPath
    FROM sys.foreign_key_columns fk
    JOIN sys.columns pc ON pc.object_id = fk.parent_object_id AND pc.column_id = fk.parent_column_id 
    JOIN sys.columns rc ON rc.object_id = fk.referenced_object_id AND rc.column_id = fk.referenced_column_id
    JOIN sys.tables p ON p.object_id = fk.parent_object_id
    JOIN sys.tables r ON r.object_id = fk.referenced_object_id
    WHERE fk.parent_object_id = OBJECT_ID(@targetObjectName)
        AND p.object_id <> r.object_id -- Ignore FKs from one column in the table to another

    UNION ALL

    -- Recursive case: Find all outgoing foreign keys for all tables
    -- on the current fringe of the recursion
    SELECT 
        p.object_id AS ParentId
        ,OBJECT_SCHEMA_NAME(p.object_id) + '.' + p.name AS ParentTable
        ,pc.column_id AS ParentColumnId
        ,pc.name AS ParentColumn
        ,r.object_id AS ChildId
        ,OBJECT_SCHEMA_NAME(r.object_id) + '.' + r.name AS ChildTable
        ,rc.column_id AS ChildColumnId
        ,rc.name AS ChildColumn
        ,cte.depth + 1 AS depth
        ,cte.TraversalPath + ',' + CONVERT(VARCHAR(MAX), r.object_id) + '_' + CONVERT(VARCHAR(MAX), pc.column_id) + '_' + CONVERT(VARCHAR(MAX), rc.column_id) AS TraversalPath
    FROM parentCTE cte
    JOIN singleColumnFkColumns fk
        ON fk.parent_object_id = cte.ChildId
        -- Optionally consider only a traversal of the same foreign key
        -- With this commented out, we can reach table A via column A1
        -- and leave table A via column A2.  If uncommented, we can only
        -- enter and leave a table via the same column
        --AND fk.parent_column_id = cte.ChildColumnId
    JOIN sys.columns pc ON pc.object_id = fk.parent_object_id AND pc.column_id = fk.parent_column_id 
    JOIN sys.columns rc ON rc.object_id = fk.referenced_object_id AND rc.column_id = fk.referenced_column_id
    JOIN sys.tables p ON p.object_id = fk.parent_object_id
    JOIN sys.tables r ON r.object_id = fk.referenced_object_id
    WHERE p.object_id <> r.object_id -- Ignore FKs from one column in the table to another
        -- If our path has already taken us to this table, avoid the cycle that would be created by returning to the same table
        AND cte.TraversalPath NOT LIKE ('%_' + CONVERT(VARCHAR(MAX), r.object_id) + '%')
)
SELECT *
INTO #paths
FROM parentCTE
ORDER BY depth, ParentTable, ChildTable
GO

-- For each distinct table that can be reached by traversing foreign keys,
-- record the shortest path to that table (or one of the shortest paths in
-- case there are multiple paths of the same length)
SELECT *
INTO #shortestPaths
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY ChildTable ORDER BY depth ASC) AS rankToThisChild
    FROM #paths
) x
WHERE rankToThisChild = 1
ORDER BY ChildTable
GO

-- Traverse the shortest path, starting from the source the full path and working backwards,
-- building up the desired join string as we go
WITH joinCTE AS (
    -- Base case: Start with the from clause to the child table at the end of the traversal
    -- Note that the first step of the recursion will re-process this same row, but adding
    -- the ParentTable => ChildTable join
    SELECT p.ChildTable
        , p.TraversalPath AS ParentTraversalPath
        , NULL AS depth
        , CONVERT(VARCHAR(MAX), 'FROM ' + p.ChildTable + ' t' + CONVERT(VARCHAR(MAX), p.depth+1)) AS JoinString
    FROM #shortestPaths p

    UNION ALL

    -- Recursive case: Process the ParentTable => ChildTable join, then recurse to the
    -- previous table in the full traversal.  We'll end once we reach the root and the
    -- "ParentTraversalPath" is the empty string
    SELECT cte.ChildTable
        , REPLACE(p.TraversalPath, ',' + CONVERT(VARCHAR, p.ChildId) + '_' + CONVERT(VARCHAR, p.ParentColumnId)+ '_' + CONVERT(VARCHAR, p.ChildColumnId), '') AS TraversalPath
        , p.depth
        , cte.JoinString + '
' + CONVERT(VARCHAR(MAX), 'JOIN ' + p.ParentTable + ' t' + CONVERT(VARCHAR(MAX), p.depth) + ' ON t' + CONVERT(VARCHAR(MAX), p.depth) + '.' + p.ParentColumn + ' = t' + CONVERT(VARCHAR(MAX), p.depth+1) + '.' + p.ChildColumn) AS JoinString
    FROM joinCTE cte
    JOIN #paths p
        ON p.TraversalPath = cte.ParentTraversalPath
)
-- Select only the fully built strings that end at the root of the traversal
-- (which should always be the specific table name, e.g. "TargetTable")
SELECT ChildTable, 'SELECT TOP 100 * 
' +JoinString
FROM joinCTE
WHERE depth = 1
ORDER BY ChildTable
GO

— Geoff Patterson
fonte

Puoi inserire l'elenco di chiavi di una tabella con due campi TAB_NAME, KEY_NAME per tutte le tabelle che desideri connettere.

Esempio, per tabella City

Città | CITY_NAME
Città | COUNTRY_NAME
Città | Province_name
Città | City_Code

allo stesso modo Provincee Country.

Raccogliere i dati per le tabelle e inserirli in un'unica tabella (ad es. Tabella dei metadati)

Ora bozza la query come di seguito

select * from
(Select Table_name,Key_name from Meta_Data 
where Table_name in ('City','Province','Country')) A,
(Select Table_name,Key_name from Meta_Data 
where Table_name in ('City','Province','Country')) B,
(Select Table_name,Key_name from Meta_Data 
where Table_name in ('City','Province','Country')) C

where

A.Table_Name <> B.Table_name and
B.Table_name <> C.Table_name and
C.Table_name <> A.Table_name and
A.Column_name = B.Column_name and
B.Column_name = C.Column_name

Questo ti consentirà di collegare le tabelle in base alle chiavi corrispondenti (stessi nomi di chiave)

Se ritieni che il nome della chiave potrebbe non corrispondere, puoi includere un campo chiave alternativo e provare a utilizzarlo nella condizione where.

— I44
fonte

Si noti che il richiedente voleva usare le systabelle esistenti in SQL Server che descrivono le colonne in una tabella, come le tabelle sono collegate insieme, ecc. Tutto ciò che esiste già. Costruire le proprie tabelle che definiscono la struttura della propria tabella per soddisfare un'esigenza specifica potrebbe essere una posizione di fallback, ma la risposta preferita utilizzerà ciò che già esiste, come fa la risposta accettata .

— RDFozz,