Understanding PostgreSQL UPSERT with ON CONFLICT
PostgreSQL provides a powerful feature called ON CONFLICT
for handling duplicate key violations during data insertion. In this post, we’ll explore the following query:
INSERT INTO grp_nodes (group_id, node_id, x, y)
VALUES
(1, 101, 10.0, 20.0),
(2, 102, 30.0, 40.0),
(3, 103, 50.0, 60.0)
ON CONFLICT (group_id, node_id)
DO UPDATE SET
x = EXCLUDED.x,
y = EXCLUDED.y;
This query illustrates how to perform a conditional insert or update, also known as an “upsert” operation. Let’s break it down in detail.
The Table Structure
Before running this query, assume the table grp_nodes
is defined as:
CREATE TABLE grp_nodes (
group_id INT NOT NULL,
node_id INT NOT NULL,
x FLOAT NOT NULL,
y FLOAT NOT NULL,
PRIMARY KEY (group_id, node_id)
);
group_id
andnode_id
form a composite primary key, ensuring uniqueness across these two columns.- The columns
x
andy
store coordinate data for each node.
Query Explanation
1. INSERT INTO
Clause
This clause specifies the target table (grp_nodes
) and the columns to populate:
INSERT INTO grp_nodes (group_id, node_id, x, y)
The subsequent VALUES
section provides the data to insert:
VALUES
(1, 101, 10.0, 20.0),
(2, 102, 30.0, 40.0),
(3, 103, 50.0, 60.0)
Each row represents a node with its group_id
, node_id
, and coordinates (x
, y
).
2. ON CONFLICT
Clause
The ON CONFLICT
clause specifies how to handle violations of the primary key constraint (composite key of group_id
and node_id
) when inserting data. Here, it targets the conflict on (group_id, node_id)
:
ON CONFLICT (group_id, node_id)
This ensures that if a row with the same group_id
and node_id
already exists, the database will not throw an error.
3. DO UPDATE SET
Clause
If a conflict occurs, the DO UPDATE SET
clause determines how to update the existing row:
DO UPDATE SET
x = EXCLUDED.x,
y = EXCLUDED.y;
- The keyword
EXCLUDED
refers to the row that caused the conflict. - In this case, the conflicting row’s
x
andy
values are updated with the new values provided in theVALUES
clause.
Complete Query Flow
- PostgreSQL attempts to insert each row.
- If no conflict occurs, the row is inserted as-is.
- If a conflict occurs on
(group_id, node_id)
, the existing row is updated with the newx
andy
values.
Example Execution
Initial Table State
group_id | node_id | x | y |
---|---|---|---|
1 | 101 | 5.0 | 10.0 |
Query Execution
When the query runs, here’s what happens:
- The first row
(1, 101, 10.0, 20.0)
encounters a conflict because(1, 101)
already exists. The existing row’sx
andy
values are updated to10.0
and20.0
, respectively. - The second and third rows
(2, 102, 30.0, 40.0)
and(3, 103, 50.0, 60.0)
are inserted successfully because they do not conflict with existing rows.
Final Table State
group_id | node_id | x | y |
---|---|---|---|
1 | 101 | 10.0 | 20.0 |
2 | 102 | 30.0 | 40.0 |
3 | 103 | 50.0 | 60.0 |
Key Benefits of ON CONFLICT
-
Atomicity:
- The operation ensures that insertion and updates are performed atomically, avoiding race conditions in concurrent environments.
-
Simplicity:
- Replaces the need for complex
MERGE
orUPSERT
logic often required in other databases.
- Replaces the need for complex
-
Efficiency:
- Avoids unnecessary updates or deletes when rows already match the desired state.
Practical Applications
- Data Synchronization: Updating existing records with new data while inserting missing entries.
- Event Tracking: Storing the latest state of entities in real-time systems.
- Geospatial Updates: Updating coordinates for nodes in mapping or network topology applications.
Conclusion
This query is an elegant solution for handling duplicate key conflicts in PostgreSQL. By leveraging the ON CONFLICT
clause, developers can streamline their workflows and write efficient, maintainable SQL for upsert operations.