DreamText

We introduce DreamText, an image-to-3D generative model that leverages text descriptors as an intermediary step for 3D reconstruction. Without 3D training data. Our approach involves training a neural radiance field, which subsequently enables the rendering of novel views of the reconstructed 3D object which appears in the image. To accomplish this we employ a pretrained diffusion model to generate diverse object views, which serve as training data for this neural radiance field. To condition the generated views to closely resemble the image object, we pretrain the conditioning input, the text, of the diffusion model. Thus as the 3D regeneration comes from ”dreaming” novel views using pretrained text, we call our model DreamText.

Samples can bee seen at this page.

Code replication

This repository is based on this other repository. Before working on Stable Diffusion, one should ensure that their computer has the CUDA libraries installed and a graphics card with at least 12 GB of memory in order to successfully run the code. Some libraries have dependencies on the others and the requirements.txt, obtained with pip freeze, might not work properly. We recommend to run it one time and then install the missing libraries manually. Also the exact version of CUDA should be checked, and torch libraries that you need for your GPU. Our experiments were performed using Ubuntu 22.04 wich Nvidia driver version: NVIDIA-SMI 525.125.06 driver CUDA Version: 12.0 \newline

There are two scripts that has to be executed to run our code: to train the text embedding and to make the 3D reconstruction once the text embedding is trained. The following code contains two imperative variables: \textit{WORKSPACE} and \textit{PATH}. First is used to indicate where all the generated data will be stored, second is used to indicate where the training image is located. Besides, for DreamText, we recommend to add noun and quantifier parameters to specify this values to the network, although they can be automatically predicted. The following snipet of code generates the sentence that describes the image and trains it:

python ./descriptor.py --workspace WORKSPACE --noun watch --quantifier a --image_path PATH

Once the sentence is generated and trained the 3D reconstruction can be made. It is imperative to provide the same workspace directory for this. The following snipet of code reconstructs the 3D object after training the descriptive sentence:

python main.py --workspace WORKSPACE -O --h 256 --w 256 --iters 10100 --load_token 1

Samples can be visualized in a graphical user interface using:

python main.py -O --test --gui --workspace WORKSPACE

We have uploaded 3 samples: examples/example1, examples/example2 and examples/example3. Change the WORKSPACE in the code above by one of these two directories to visualize them.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
assets		assets
examples		examples
freqencoder		freqencoder
gridencoder		gridencoder
nerf		nerf
raymarching		raymarching
scripts		scripts
shencoder		shencoder
taichi_modules		taichi_modules
LICENSE		LICENSE
README.md		README.md
activation.py		activation.py
adjectives.txt		adjectives.txt
descriptor.py		descriptor.py
encoding.py		encoding.py
extract-mask.py		extract-mask.py
gradio_app.py		gradio_app.py
main.py		main.py
meshutils.py		meshutils.py
nounlist.txt		nounlist.txt
optimizer.py		optimizer.py
prompts.json		prompts.json
requirements.txt		requirements.txt
sd.py		sd.py
stable_diffusion.py		stable_diffusion.py
trace.py		trace.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamText

Code replication

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DreamText

Code replication

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages