🌌 AlphaSpace: Spatial Intelligence with Tokens

Teaching language models to think in 3D

What is this?

AlphaSpace lets language models understand and manipulate objects in 3D space without needing eyes.

Traditional robots need computer vision to see objects. We've created a different approach: structured tokens that encode spatial information, allowing language models to reason about position, orientation, and physical relationships between objects.

🤔 How it Works: A Token-Based Approach

AlphaSpace uses a hierarchical position-based token system:

Global position tokens: Represent a 25x25 grid (<|row-col|>, e.g., <|5-10|>)
Local position tokens: Provide 4x4 fine-grained positioning within each cell (<|local-row-col|>, e.g., <|local-2-3|>)
Object attribute tokens: Encode object properties (<|color|><|object|>, e.g., <|red|><|cube|>)
State tokens: Special indicators like <|empty|>, <|origin|>, and <|target|>

Our specialized training teaches models to manipulate these tokens to solve spatial problems.

🔥 Why it matters

No cameras needed: Pure language-based spatial reasoning
Precise manipulation: Robots can position objects at exact coordinates
Natural instructions: Tell robots what to do in plain language
Generalizes well: Works with novel objects and spatial arrangements

Get the code

Try it yourself

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
import torch
from utils import tokenize_desk, SYSTEM_PROMPT

# Load the mode


model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define your workspace
objects = [
    {"red-cube": [51, 43, 17]},
    {"black-cube": [44, 58, 17]},
    {"purple-cube": [74, 59, 17]},
    {"green-cube": [65, 82, 17]},
]

# Give a natural language instruction
instruction = "Throw the red cube on top of the blue cylinder"
desk, object_height = tokenize_desk(objects)
final_instruction = SYSTEM_PROMPT.format(object_height=object_height,instruction=instruction,TABLE_MAP=desk)
chat = [
    {"role": "user", "content": final_instruction.strip()}
]
tokenized_chat = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, use_system_prompt=False, return_tensors="pt")
# print(len(tokenized_chat[0]))
generated_ids = model.generate(
    tokenized_chat.to("cuda"),
    max_new_tokens=2048,
    do_sample=False,
    temperature=0.6,
)
# Get the solution
result = tokenizer.decode(generated_ids[0][tokenized_chat.shape[1]:], skip_special_tokens=True)
print(result)

Looking ahead

We're working on:

Dynamic scene understanding
Multi-step manipulation planning
Physics-aware spatial reasoning

Join us

We're looking for collaborators and plan to expand the model's capabilities to include additional spatial tasks in the future! Check out the issues to get started.

References

@misc{dao2025alphaspaceenablingroboticactions,
      title={AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning}, 
      author={Alan Dao and Dinh Bach Vu and Bui Quang Huy},
      year={2025},
      eprint={2503.18769},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.18769}, 
}

Acknowledgement

AlphaSpace was created by Menlo Reseach. If you use it in your research, please cite our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
service		service
LICENSE		LICENSE
README.md		README.md
ezgif-725a7d9cda83ca.gif		ezgif-725a7d9cda83ca.gif
synthetic_data_pick_place.py		synthetic_data_pick_place.py
training_config.yaml		training_config.yaml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌌 AlphaSpace: Spatial Intelligence with Tokens

What is this?

🤔 How it Works: A Token-Based Approach

🔥 Why it matters

Get the code

Try it yourself

Looking ahead

Join us

References

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

menloresearch/space-thinker

Folders and files

Latest commit

History

Repository files navigation

🌌 AlphaSpace: Spatial Intelligence with Tokens

What is this?

🤔 How it Works: A Token-Based Approach

🔥 Why it matters

Get the code

Try it yourself

Looking ahead

Join us

References

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages