08.03.2025 13:01
3

Project Caretaker. Part 2. Design

Given my lack of advanced artistic skills, I'll turn to external help here. Fortunately, there's a wide range of options available.

Defining the Requirements

I've been nurturing the idea of creating a certain device for a long time. The requirements were minimal - a front camera, tracks, a WiFi antenna, and an interesting design (minimalism, futurism, cyberpunk, etc.).

With these ideas in mind, I approached ChatGPT and literally asked for a description of our device's appearance based on this request, and in detail. I mentioned beforehand that the description would be used for image generation, so it shouldn't include any information that would be irrelevant for images (functional purpose, operational logic, control algorithms, etc.).

I want to make a tracked robot with a rotating camera and WiFi antenna.

Write 10 variations of prompts for generating an image of my robot design in English. The prompts should be as varied as possible.

Along the way, I adjusted my requirements - for example, at some point I abandoned the rotating camera, as wide-angle and even regular OV2640 cameras have a sufficient viewing angle, eliminating the need for rotation. I also asked to give the generations anthropomorphic qualities, add emotions, and use various styles.

In the end, 150 prompts were obtained and used.

If you're interested, you can check out the dialogue in ChatGPT. All the generated prompts can be found in the project on GitHub.

txt2image

During the hype around Flux, I locally deployed forge with flux.dev - I mentioned this in my article about training flux lora on Civitai (which you can download and use locally). My 3060Ti with 8GB of video memory allows me to generate 1024x1024 images in about a minute.

For each of the 150 prompts, 10 different images were generated, so the total number to choose from came to 1,500.

Since I didn't even read all the prompts, some series of generations turned out to be far from reality and my wishes. Here's an example of 100 random images from the entire set:

205-collage

Most of the generations resembled military equipment (logically, this is one of the most common types of tracked vehicles). Some looked like printers or MFPs on wheels (in the bottom left corner) - huge smooth boxes. Some turned out to be completely unrealistic or with an excess of details, and some resembled a spaceship that I could never model.

That's exactly why so many generations were needed. Nevertheless, there were 10 candidates to choose from, which you can see on GitHub. I settled on the following:

208-favorite-design

It has a certain charisma to it - I hope everyone will behave properly under its supervision.

What to do next? The shape is still complex, and modeling it from scratch in CAD (Fusion360/SolidWorks/etc) would be difficult or time-consuming for me. I'm not familiar enough with Blender or other modeling tools to model everything from scratch based on a single image.

But since we've started using AI - why stop? I searched Google for "image 2 model online" and picked several options I liked (some of which I had already heard about and even tried in one of my posts).

image2model

All the generators described below can generate models from both text and images, support different features, have different pricing plans, but all of them allow you to generate a model from an image for free, download it as STL or FBX, and they even attempt to texture it.

meshy

Meshy gives you a choice of 4 models during the process, one of which will be more detailed later. I chose the upper right one, as it seemed more similar to the reference.

211-meshy

Then it spends some time thinking, detailing, texturing, and all that. Ultimately, it produces something horrific:

210-meshy

It looks like it was made of plasticine and then melted in the heat or was crushed - it goes straight into the trash. Let's move on.

Hyper3d aka Rodin

The generation scheme is the same - upload an image, wait, go through several stages (structure, texturing) - there seem to be more possibilities here. Also, according to chats and reviews in the AI community, Rodin is one of the top services for model generation.

214-rodin

The result looks very similar to the reference. Let's download the model file and rotate it in an editor (I used OrcaSlicer, which was open at hand):

215-rodin

216-rodin

Much better - there's a bit of noise on the front, the tracks and wheels are smooth and well-developed, and the bottom and chassis mounts also look almost adequate and functional. But the body itself, the "facial part," has lost details; there are no certain slots and gaps that give the reference its expressiveness. The camera block also looks somewhat blocky. We'll take note for now, but let's see what else we have.

Tripo3d

Works in a similar way - upload an image, wait longer than with all the others, confirm the preliminary version, and get a textured model as output:

218-tripo3d

It looks very similar to the reference, the charisma is preserved. Let's take a look at the model itself:

219-tripo3d

By the way, the model is public; you can view and rotate it on the tripo3d website.

The bottom, of course, has nothing to do with reality, but that's not so important. The tracks are crooked, and the rollers don't even fit close to the tracks. But tracks are something that can be modeled relatively easily in CAD; it's a parametrically measurable part, as is the entire chassis.

Since we've generated 1,500 images, what's stopping us from repeating the same case with models? The limits are the obstacle - the demo only gives you a certain limit of local credits. Also, the process takes a long time and requires attention. But I'll show you one more variant from tripo3d to demonstrate the variability:

220-tripo3d

It's clear that there aren't many changes; we won't get a perfect model anyway, so I settled on the right variant.

It's also obvious that these models can't be used as is - small models are noisy, I won't even show the mesh - all this will have to be remodeled, but it's very good as a foundation. I wouldn't have been able to model everything from a single picture anyway.

Stay tuned for upcoming articles about the modeling process and the project development as a whole.

Tags: Diy AI ChatGPT Flux

No comments yet

Latest articles