Ceph Cheat Sheet for Linux Sysadmins

Posted Oct 4, 2025

By Liviu Gelca

4 min read

Ceph Cheat Sheet

A quick reference guide for Linux sysadmins managing Ceph clusters.

🔹 Ceph Basics

Cluster Status:

ceph status
ceph -s              # Short form

Check Ceph Health:
1 ceph health detail
Cluster Configuration Dump:
1 ceph config dump

🔹 Monitor & Logs

View Monitor Map:
1 ceph mon dump
Monitor Logs:
1 journalctl -u ceph-mon@<host> -f

🔹 OSD Management

List OSDs:
1 2 ceph osd tree ceph osd ls
OSD Usage:
1 2 ceph osd df ceph osd utilization

Mark OSD Out / In:

ceph osd out <osd_id>
ceph osd in <osd_id>

Stop / Start OSD Service:

  
systemctl stop ceph-osd@<id>
systemctl start ceph-osd@<id>

Remove OSD:

ceph osd purge <osd_id> --yes-i-really-mean-it

🔹 Pool Management

List Pools:
1 ceph osd pool ls

Create a Pool:

ceph osd pool create mypool 128 128 replicated

Delete a Pool:

ceph osd pool delete mypool mypool --yes-i-really-really-mean-it

Get Pool Stats:
1 2 ceph df ceph osd pool stats mypool

🔹 PG (Placement Groups)

Check PGs:
1 2 ceph pg stat ceph pg dump | less
Repair PG:
1 ceph pg repair <pg_id>

🔹 CRUSH Map

View CRUSH Map:

ceph osd crush tree
ceph osd crush dump | less

Edit CRUSH Map:

  
ceph osd getcrushmap -o map.bin
crushtool -d map.bin -o map.txt
# edit map.txt
crushtool -c map.txt -o newmap.bin
ceph osd setcrushmap -i newmap.bin

🔹 RADOS & RGW

Check RGW Users:
1 radosgw-admin user list

Create a User:

  
radosgw-admin user create --uid="testuser" --display-name="Test User"

Get User Key:

  
radosgw-admin key create --uid="testuser"

Check Bucket Stats:

radosgw-admin bucket stats --bucket=mybucket

🔹 MDS (CephFS)

List Filesystems:
1 ceph fs ls
Check MDS Status:
1 ceph mds stat

Create Filesystem:

ceph fs new cephfs cephfs_metadata cephfs_data

🔹 Debug & Troubleshooting

Cluster Health:
1 ceph health detail
Slow Requests:
1 ceph health | grep slow
Check for Scrubbing / Recovery:
1 ceph status | egrep "recovery|scrub"
Detailed Logs:
1 ceph -w # Watch cluster events

🔹 Quick Recovery Scenarios

🟢 MON Down

Check quorum:

ceph quorum_status | jq '.quorum_names'

Restart MON service:
1 systemctl restart ceph-mon@<host>

If still failing, remove and re-add MON:

ceph mon remove <mon_id>
ceph mon add <mon_id> <ip:port>

🟡 OSD Down / Lost

Restart OSD:
1 systemctl restart ceph-osd@<id>
Mark OSD in if safe:
1 ceph osd in <osd_id>

If permanently failed:

ceph osd purge <osd_id> --yes-i-really-mean-it

🟠 PGs Stuck in peering / degraded

Check PGs:
1 ceph pg dump | grep <pg_id>
Force PG repair:
1 ceph pg repair <pg_id>
Kick stuck OSD:
1 ceph osd out <osd_id>

🔴 Full Cluster / No Free Space

Check space:
1 2 ceph df ceph osd df
Add new OSD(s).

Set nearfull / full ratio to safe values:

ceph osd set-nearfull-ratio 0.85
ceph osd set-full-ratio 0.95

🔧 Stuck Recovery / Backfill

Pause recovery temporarily:
1 ceph osd set norecover
Resume after troubleshooting:
1 ceph osd unset norecover

🔹 Tuning Flags Cheat Sheet

Flag	Command Example	Use Case
`noout`	`ceph osd set noout`	Prevents OSDs marked down from being automatically marked out.
`nobackfill`	`ceph osd set nobackfill`	Prevents backfilling to avoid heavy IO load during upgrades or testing.
`norebalance`	`ceph osd set norebalance`	Stops automatic data balancing.
`norecover`	`ceph osd set norecover`	Disables recovery processes.
`pause`	`ceph osd set pause`	Pauses client IO cluster-wide (use very carefully!).
`noscrub`	`ceph osd set noscrub`	Disables scrubbing temporarily.
`nodeep-scrub`	`ceph osd set nodeep-scrub`	Disables deep scrubbing temporarily.

Always unset flags when done, e.g.:

ceph osd unset noout

🔹 Legend (Ceph Terminology)

MON (Monitor) – Maintains cluster maps, ensures quorum.
OSD (Object Storage Daemon) – Stores data, handles replication, recovery, backfill, rebalancing. One OSD = one disk.
PG (Placement Group) – Logical grouping of objects across OSDs. Helps map data to OSDs.
CRUSH Map – Algorithm/map that decides data placement across OSDs.
MDS (Metadata Server) – Manages metadata for CephFS (directories, file ownership, permissions).
RADOS – Reliable Autonomic Distributed Object Store (core of Ceph).
RGW (RADOS Gateway) – S3/Swift-compatible object storage gateway.
Scrubbing – Consistency check between objects and replicas (light = metadata only, deep = full data).
Backfill – Process of redistributing data to OSDs when new OSDs are added or after recovery.
Recovery – Process of replicating data when an OSD fails or comes back online.

🔹 Best Practices

Monitor daily:
1 2 ceph status ceph health detail
Always check PG health after adding/removing OSDs.
Use ceph osd df before expansion to balance data distribution.
Regularly backup:
- ceph.conf
- Keyrings
- Monitor DB
Schedule scrubbing and monitor for stuck PGs.
Avoid pools with <64 PGs in production.

Cheat Sheet, ceph

This post is licensed under CC BY 4.0 by the author.