Much the same as you'd implement it on any other platform?
In general: http://www.8052mcu.com/forum/read/160143
CANFestival is a good starting point for basic services. You'll find the STM32 16-bit CAN filtering is ideal for routing CANopen services, especially for high priority services like NMT heartbeat and EMCY alert messages. I recommend using one FIFO for high priority and the second for normal priority, so that you minimize response time for events like emergency shutdown.
Also, be sure to enable the error/status interrupt. If something goes wrong you need the information on bus events and error counts. They are essential for reliable operation, especially during startup.
One difficulty you'll run into is an inactive bus. If your device is detached from the bus, or the bus is shut down, or you're bringing up the network one node at a time, you will see continuous errors until there are two working nodes. I got around this by adapting an ethernet-like collision strategy, logically disconnecting the device from the bus for random time intervals. Intervals increase over time so you're guaranteed an overlap for startup of the first two nodes.
And yes, the CANopen documantation is the textbook of record for any implementation. Start with basic services: NMT heartbeat; EMCY if you need a fast process shutdown (like a heartbeat failure on a critical node); SDO infrastructure and PDO data broadcasts for the actual workload. Depending on how you structure your PDOs you may need SYNC as well to trigger a synchronized burst of data PDOs from several nodes.
You can hold off on other services like TIME, LSS and SRDO/global fault until you get some experience.
Another gotcha: Bear in mind if you support the two-level message priority you may see messages arriving in what appears to be out-of-order sequence. Receiving an EMCY alert may precede the PDO data broadcast from a failing node, even though the EMCY was sent after the PDO.